July 16, 1996
The design of many clinical trials includes some strategy for early
stopping if an interim analysis reveals large differences between treatment
groups. In addition to saving time and resources, such a design feature
can reduce study participants' exposure to the inferior treatment.
However, when repeated significance testing on accumulating data is done,
some adjustment of the usual hypothesis testing procedure must be made to
maintain an overall significance level (Armitage, McPherson & Rowe, 1969;
McPherson & Armitage, 1971). The methods described by Pocock (1977) and
O'Brien & Fleming (1979), among others, are popular implementations of
group sequential testing for clinical trials. Sometimes interim analyses
are equally spaced in terms of calendar time or the information available
from the data, but this assumption can be relaxed to allow for unplanned or
unequally spaced analyses. Lan & DeMets (1983) introduced type I error
spending functions, denoted
, and determined boundaries by
where
are (upper) boundaries for the sequence of interim test statistics and
is either the proportion of elapsed time to maximum duration or
observed information to total information. That is, if the interim
standardized test statistic at the
interim analysis is denoted by
, we continue the trial as long as
(two-sided),
otherwise termination is considered. The spending function
for
and
for
. That is, this flexible procedure guarantees a fixed
level when the trial is complete. Neither the time or the number
of analyses needs to be specified in advance: only
must be specified. Issues surrounding the use of calendar time and
information have been discussed by Lan & DeMets (1989) and Lan, Reboussin
& DeMets (1994). Spending functions, which are also called use functions,
are prespecified and correspond to those described by Lan & DeMets (1983)
and Kim & DeMets (1987a). These are similar to commonly used group
sequential boundaries proposed by Pocock (1977) and O'Brien & Fleming
(1979). Additional spending functions may be found in Hwang, Shih & de
Cani (1990).
Figure: Sequential outcomes and boundaries for interim standardized
test statistics from a clinical trial.
The program described here perform computations related to group sequential
boundaries, such as the one illustrated in Figure
. The
program begins by prompting the user to specify whether it is being run
interactively or not, and then to specify one of four options. It
continues prompting based on the selected option. The options are:
) and a searching routine which
makes an initial choice of boundaries, computes stopping probabilities, and
alters boundaries until the desired alpha level is obtained. The other
options, in contrast, evaluate probabilities associated with a given set of
boundaries. They require as input boundaries and times for the interim
analyses.
The package can be used to design sequential trials, determine boundary
values while the trial is ongoing or compute confidence intervals when the
trial is ended. We present examples for design or analyses using test
statistics comparing mean, binomial, survival or repeated measures
outcomes.
A detailed presentation of the methodology may be found in Lan & DeMets
(1983), DeMets & Lan (1984), and Lan & Zucker (1993). Group sequential
procedures for interim analyses are equivalent to discrete boundary
crossing problems for a Brownian motion process W(t) with drift parameter
. We take advantage of this correspondence in both theoretical
developments and in implementation. At each interim analysis, a
standardized test statistic
is computed. These normally distributed
variates
have mean
, where
is the ``drift'' parameter, and for
,
where
is the information fraction (or information time) at the
analysis, e.g.
if
is the maximum sample
size (per arm). The drift parameter
and the standardized
difference
are related by the equation
To reiterate in more technical terms, the program uses Equation
(
) to determine one of
It may be useful to note correspondences between the notation used here and in some other references (see Table 1).
Table: Correspondence of notation for commonly used group sequential
parameters.
To clarify notation for the sample size, let
be the number of
subjects at the
look in each treatment arm.
is the maximum
number of subjects per treatment arm and K is the maximum number of looks
or interim analyses. If there are n subjects accumulated between interim
analyses,
. The drift parameter
can be expressed in
terms of the noncentrality parameter
in Pocock (1977) as
.
Although spending functions provide flexibility in data monitoring and
do not require analysis times to be prespecified, the anticipated number
and timing of interim analyses must be specified for design purposes.
This is not more restrictive than for the group sequential
procedures proposed by Pocock (1977) or O'Brien & Fleming (1979).
Deviation from the initial design, even substantially, does not cause
a serious loss of power. Thus for design only, we shall assume
, where K is the anticipated number of interim analyses and n
is the anticipated number of subjects accrued between analyses.
Kim & DeMets (1992) provide a detailed discussion of sample size
determination for group sequential testing. The relationship between
sample size and power depends on two quantities: the drift parameter of the
underlying Brownian motion and the standardized difference between control
and treatment arms. Thus by determining
and
for a
particular design problem, the required sample size can be computed. The
value of
depends on the desired power, the set of boundaries and
analysis times, and the properties of Brownian motion. Exit or rejection
probabilities for Brownian motion given a set of boundaries can be computed
by the program or, for certain designs, found in the tables provided by Kim
& DeMets (1992). The sequential boundaries are determined by the choice
of spending function
, the number and timing of
interim analyses, the
level and whether the test is one or two
sided. The standardized difference
, on the other hand, depends
on the type of data to be collected by the study. Several examples are
detailed below for normal, binomial and survival data.
Kim & DeMets (1992) provide tables of drift parameters for spending
functions producing O'Brien-Fleming type and Pocock type boundaries
(
and
, respectively). The
program currently offers five choices for spending functions, but others
can be added (see Appendix).
Kim & DeMets (1992) discuss the following example. Suppose that a
normally distributed response has mean in controls of
with
standard deviation
. The null hypothesis is
, where
is the mean in the experimental group,
expected to be 200. The test statistic is
Then the drift parameter is
So
For the program, we specify two-sided
O'Brien-Fleming type
(
) boundaries with K = 5 looks at 0.2, 0.4, 0.6, 0.8
and 1.0 (see Section 4.1). The output boundary values are
Kim & DeMets (1992) indicate that
for 90% power, so
The program can verify that
corresponds to 90% power, and
that alternative timings of analyses does not greatly affect the power (see
Section 4.1). The effect of alternative assumptions for
on sample
size can be determined without recomputing
.
Suppose that in the previous example the O'Brien-Fleming type boundaries were
replaced with Pocock type boundaries (
). The computations
are identical except for the value of
. Two-sided 0.05 Pocock
type boundary values are
Kim and DeMets (1992) indicate that for 90% power using these boundaries,
, so
We duplicate an example from Pocock (1977). If we take
and
N = 5, corresponding boundaries are determined. For a desired power
, we determine using the program that
so that
. To compare two sample means, we compute
where
and
and from Pocock (1977)
For
,
so 2nN = 2(20)(5) = 200 subjects.
In the binomial case, where we test
, assume
and
. The statistic
has asymptotically a normal distribution with a mean of 0 and a variance
of 1 (under
). The standardized difference is
where
Kim & DeMets (1992) show
so
For example, if
and
under the alternative hypothesis,
then
, and for a one sided
test using five
interim analyses and Pocock type boundaries (
), we have
For
and 90% power, Kim & DeMets (1992) report
(or see Section 4.2), so
As another binomial example, consider a two sided
test with
O'Brien-Fleming type (
) boundaries, and for design
purposes only, assume K=5 equally spaced analyses at 0.2, 0.4, 0.6, 0.8
and 1.0. As above, we take
, but now let
and
under the alternative hypothesis (a 25% reduction,
). The program produces
From Kim and DeMets (1992),
(see Section 4.1) so
Suppose we are interested in comparing the hazard rate of two populations.
Let
be the hazard function of the control group and
the hazard function in the treatment group. Under the
null hypothesis
and
. The logrank statistic is
where d is the number of events,
is 1 if the event at
is in
the control group and 0 if it is in the treatment group,
is the
number of patients in the control group at risk just before
, and
is the number of patients in the treatment group at risk just
before
. The expected value of L(d) is approximately
, and the estimated variance is
These approximations are reasonable if
and
is close to 0. If
is the number of events at analysis k, the
statistic
has a
distribution, so
Then the maximum number of events required per arm is
If we assume
and
(see
Section 4.3),
Many clinical trials are designed to measure subjects repeatedly over the course of the trial, and define as the primary outcome the change or slope over time. For such trials, the difference between treatment groups can be tested using the estimated slopes from each group using
where
and
are the average of the slopes
estimated for patients in the treatment and control groups at the
interim analysis, and
and
are their variances.
The sequentially computed
have been shown to have the required
Brownian motion structure when the variance parameters are known
(Reboussin, Lan & DeMets, 1992; Wu & Lan, 1992). Lan, Reboussin &
DeMets (1994) show
where
and
are the mean population slopes,
is the
between patient variance of the slopes, and
is the natural
estimate of total information at the end of the trial. For the comparison
of means and binomial proportions,
, but in this case, the
natural estimate of total information, denoted
, is the sum of the
natural estimates of information for each patient:
where R is the ratio of within to between patient variance. For design
purposes, we may assume an identical number and timing of measurements for
all patients, so that
is
. Then
and
so
If a sufficient number of observations are taken on each patient, the term
is nearly one (Lan, Reboussin
& DeMets, 1994), so that the power computations are similar to the normal
case.
We describe how to run the program using data from the Beta-Blocker Heart Attack Trial or BHAT (Beta-Blocker Heart Attack Trial Research Group, 1982). BHAT, a study sponsored by the National Heart, Lung and Blood Institute, was designed to test whether long term use of propranolol by patients with recent heart attack reduced mortality. The following example does not correspond exactly to what was actually done for BHAT, though it is similar. From June 1978 to October 1980, 3837 patients were randomized to either propranolol (1916 patients) or placebo (1921 patients). Follow-up was originally scheduled to end in June 1982. The total information D (number of deaths by June 1982) was never observed since the trial was terminated early in October 1981. The value of D was estimated to be 628 when BHAT was designed, but with the data available in September 1982, was estimated to be around 400 (Lan & DeMets, 1989). In the six Policy and Data Monitoring Board meetings (May 1979, October 1979, March 1980, October 1980, April 1981, and October 1981), the observed number of deaths were (56, 77, 126, 177, 247, 318) and normalized log-rank statistics were (1.68, 2.24, 2.37, 2.30, 2.34, 2.82).
Let
denote calendar time measured from the beginning of the trial,
and
denote the maximum duration in calendar time. Let
be the
information fraction or ``information time'', which must often be estimated
by
, some function either of calendar time or number of
observed patients or events. We begin with an example using only calendar
time.
Set
in June 1978 and assume the maximum duration is
months, which corresponds to June 1982. Then the calendar times for
interim analyses correspond to (11, 16, 21, 28, 34, 40) months after the
start of the trial. We estimate
as a function of calendar time by
, so the information times are (0.2292,
0.3333, 0.4375, 0.5833, 0.7083, 0.8333), and adopt the spending function
to construct a data monitoring boundary.
This corresponds to
in Lan & DeMets (1983) and Kim &
DeMets (1987a). The original BHAT design had a two-sided significance
level of 0.05.
When the data were monitored in May 1979,
,
and
. The program produces a boundary value of
: if
is standard normal,
. In October 1979,
,
, and
. Ignoring the observed number of deaths and using only calendar
time, the calculation proceeds as follows. Suppose
and
are
standard normal with correlation coefficient
We wish to find
such that
This solution requires some numerical integration which the program
performs. In fact, this equality is satisfied if
.
In this example, after specifying Option 1, the user is prompted for
We now repeat the above calculation using the information in the number of deaths. Assuming the total information is the number of expected events, D = 628, the information fractions are (56/628, 77/628, 126/628, 177/628, 247/628, 318/628), or (0.0892, 0.1226, 0.2006, 0.2818, 0.3933, 0.5064). Then at the second interim analysis, the program would ask for
Some users may be familiar with the use of both information and calendar
time as described in Lan & DeMets (1989) and Lan, Reboussin & DeMets
(1994). The program includes such an option. We will use the percent of
elapsed calendar time to determine how much type I error probability is to
be spent, but for the correlation of successive test statistics, we will
use the information in the number of deaths. The first boundary is computed
exactly as above. For the analysis in October 1979, at 16 months,
,
, and
also just as before. To evaluate
, note that even though
is unknown,
is observed.
If
and
are standard normal then the correlation coefficient
, and the solution to
is
. The program asks the same questions as before (see Section 4.6).
Since the times entered were based on the percent of elapsed calendar time,
it is desirable to use the information available in the number of deaths.
When the question on a second time scale for information is asked, we
answer ``yes'' and enter the information for each analysis, which is the
number of deaths in this example. The resulting boundaries are (2.53, 2.59,
2.63, 2.50, 2.51, 2.47) for the six data monitoring points of BHAT, and
this boundary is crossed at
or in October of 1981. This is
the same as the result given for the example in Lan & DeMets (1989).
Kim & DeMets (1987b) detail the theory for confidence intervals
following early termination using group sequential tests. Suppose that
a trial has been stopped at the
analysis with boundary values
and with final standardized estimate of treatment difference
. The
confidence interval is based on computing upper exit probabilities
associated with
Continuing with the previous example, the final observed standardized statistic was 2.82, and suppose that a 95 percent confidence interval is desired. The program prompts for
Using the equation
we can translate this
interval into an interval for
. The statistic is based on 318
events, so
, or
is
the lower bound. Repeating this computation for the upper bound, we obtain
(0.021, 0.553) as a 95% confidence interval for
.
This section contains examples of interactive sessions with the program, which were used for the examples considered in Sections 2 and 3.
This program output related to the first example in Section 2.1. For
this example, we use 5 equally spaced interim analyses (0.2, 0.4, 0.6,
0.8, and 1.0) with two-sided O'Brien-Fleming boundaries and
. We first determine the boundaries and then for these boundaries,
determining the drift parameter
to calculate a sample size.
PROGRAM PROMPTS USER INPUT
Is this an interactive session? (1=yes,0=no)
y
interactive = 1
Enter number for your option:
(1) Compute bounds for given spending function.
(2) Compute drift for given power and bounds
(3) Compute probabilities for given bounds.
(4) Compute confidence interval.
1
Option 1: You will be prompted for a spending function.
Number of interim analyses?
5
5 interim analyses.
Equally spaced times between 0 and 1? (1=yes,0=no)
y
Analysis times: 0.200 0.400 0.600 0.800 1.000
Do you wish to specify a second time/information scale? (e.g.
number of patients or number of events, as in Lan & DeMets 89?) (1=yes, 0=no)
n
Overall significance level? (>0 and <=1)
.05
alpha = 0.050
One(1) or two(2)-sided symmetric?
2
2.-sided test
Use function? (1-5)
(1) OBrien-Fleming type
(2) Pocock type
(3) alpha * t
(4) alpha * t^1.5
(5) alpha * t^2
1
Use function alpha-star 1
Do you wish to truncate the standardized bounds? (1=yes, 0=no) n
Bounds will not be truncated.
This program generates two-sided symmetric boundaries.
n = 5
alpha = 0.050
use function for the lower boundary = 1
use function for the upper boundary = 1
Time Bounds alpha(i)-alpha(i-1) cum alpha
0.20 -4.8769 4.8769 0.00000 0.00000
0.40 -3.3569 3.3569 0.00079 0.00079
0.60 -2.6803 2.6803 0.00683 0.00762
0.80 -2.2898 2.2898 0.01681 0.02442
1.00 -2.0310 2.0310 0.02558 0.05000
Do you want to see a graph? (1=yes,0=no)
y
:
5.00: *
4.60:
4.20:
3.80:
3.40: *
3.00:
2.60: *
2.20: * *
1.80:
1.40:
1.00:
0.60:
0.20:
-0.20:
-0.60:
-1.00:
-1.40:
-1.80:
-2.20: * *
-2.60: *
-3.00:
-3.40: *
-3.80:
-4.20:
-4.60:
-5.00: *
...............................................
0 .1 .2 .3 .4 .5 .6 .7 .8 .9 1
Done.
Once these initial boundaries are obtained, to compute the required sample size, we must find the drift parameter corresponding to the desired power. In the program, this is option 2. We enter the times and boundary values and select the desired power. Alternatively, drift parameters for some potential analysis scenarios are contained in Kim & DeMets (1992). In our example, a drift parameter of 3.2788 gives a power of 0.90.
PROGRAM PROMPTS USER INPUT
Is this an interactive session? (1=yes,0=no)
y
interactive = 1
Enter number for your option:
(1) Compute bounds for given spending function.
(2) Compute drift for given power and bounds
(3) Compute probabilities for given bounds.
(4) Compute confidence interval.
2
Option 2: You will be prompted for bounds and a power level.
Number of interim analyses?
5
5 interim analyses.
Equally spaced times between 0 and 1? (1=yes,0=no)
y
Analysis times: 0.200 0.400 0.600 0.800 1.000
Are you using a spending function to determine bounds? (1=yes,0=no)
y
Spending function will determine bounds.
Overall significance level? (>0 and <=1)
.05
alpha = 0.050
One(1) or two(2)-sided symmetric?
2
2.-sided test
Use function? (1-5)
(1) OBrien-Fleming type
(2) Pocock type
(3) alpha * t
(4) alpha * t^1.5
(5) alpha * t^2
1
Use function alpha-star 1
Do you wish to truncate the standardized bounds? (1=yes, 0=no) n
Bounds will not be truncated.
Time Bounds
0.20 -4.8769 4.8769
0.40 -3.3569 3.3569
0.60 -2.6803 2.6803
0.80 -2.2898 2.2898
1.00 -2.0310 2.0310
Desired power? (>0 and <=1)
.9
Power is 0.900
n = 5, drift = 3.2788
look time lower upper exit probability cum exit pr
1 0.20 -4.8769 4.8769 0.00032 0.00032
2 0.40 -3.3569 3.3569 0.09939 0.09971
3 0.60 -2.6803 2.6803 0.34658 0.44629
4 0.80 -2.2898 2.2898 0.29966 0.74595
5 1.00 -2.0310 2.0310 0.15405 0.90000
Done.
A drift of 3.28 was used in Section 2.1.1 to compute the required sample
size for 90% power, which was 48.44 patients per arm.
Consider another sample size determination based on a different initial analysis plan. This set of analyses will be planned for unequally spaced time points 0.1, 0.4, 0.75, 1.0, but other features of the test are the same. The program determines the corresponding drift parameter.
PROGRAM PROMPTS USER INPUT
Is this an interactive session? (1=yes,0=no)
y
interactive = 1
Enter number for your option:
(1) Compute bounds for given spending function.
(2) Compute drift for given power and bounds
(3) Compute probabilities for given bounds.
(4) Compute confidence interval.
2
Option 2: You will be prompted for bounds and a power level.
Number of interim analyses?
4
4 interim analyses.
Equally spaced times between 0 and 1? (1=yes,0=no)
n
Times of interim analyses: (>0 & <=1)
.1 .4 .75 1.0
Analysis times: 0.100 0.400 0.750 1.000
Are you using a spending function to determine bounds? (1=yes,0=no)
y
Spending function will determine bounds.
Overall significance level? (>0 and <=1)
.05
alpha = 0.050
One(1) or two(2)-sided symmetric?
2
2.-sided test
Use function? (1-5)
(1) OBrien-Fleming type
(2) Pocock type
(3) alpha * t
(4) alpha * t^1.5
(5) alpha * t^2
1
Use function alpha-star 1
Do you wish to truncate the standardized bounds? (1=yes, 0=no) n
Bounds will not be truncated.
Time Bounds
0.10 -6.9914 6.9914
0.40 -3.3569 3.3569
0.75 -2.3449 2.3449
1.00 -2.0125 2.0125
Desired power? (>0 and <=1)
.9
Power is 0.900
n = 4, drift = 3.2696
look time lower upper exit probability cum exit pr
1 0.10 -6.9914 6.9914 0.00000 0.00000
2 0.40 -3.3569 3.3569 0.09871 0.09871
3 0.75 -2.3449 2.3449 0.58876 0.68746
4 1.00 -2.0125 2.0125 0.21254 0.90000
Done.
The sample size is computed
Notice that the different timing of interim analyses has little impact on the sample size needed to achieve 90% power.
In much the same manner as was done to compare two means from a normal population, we can compare two proportions from a binomial population. Recall the example from Section 2.2.1. We use option 2 to determine the drift parameter for a power of 90% given one sided 0.05 Pocock boundaries and five equally spaced analyses:
PROGRAM PROMPTS USER INPUT
Is this an interactive session? (1=yes,0=no)
y
interactive = 1
Enter number for your option:
(1) Compute bounds for given spending function.
(2) Compute drift for given power and bounds
(3) Compute probabilities for given bounds.
(4) Compute confidence interval.
2
Option 2: You will be prompted for bounds and a power level.
Number of interim analyses?
5
5 interim analyses.
Equally spaced times between 0 and 1? (1=yes,0=no)
y
Analysis times: 0.200 0.400 0.600 0.800 1.000
Are you using a spending function to determine bounds? (1=yes,0=no)
y
Spending function will determine bounds.
Overall significance level? (>0 and <=1)
.05
alpha = 0.050
One(1) or two(2)-sided symmetric?
1
1.-sided test
Use function? (1-5)
(1) OBrien-Fleming type
(2) Pocock type
(3) alpha * t
(4) alpha * t^1.5
(5) alpha * t^2
2
Use function alpha-star 2
Do you wish to truncate the standardized bounds? (1=yes, 0=no) n
Bounds will not be truncated.
Time Bounds
0.20 -8.0000 2.1762
0.40 -8.0000 2.1437
0.60 -8.0000 2.1132
0.80 -8.0000 2.0895
1.00 -8.0000 2.0709
Desired power? (>0 and <=1)
.9
Power is 0.900
n = 5, drift = 3.2055
look time lower upper exit probability cum exit pr
1 0.20 -8.0000 2.1762 0.22884 0.22884
2 0.40 -8.0000 2.1437 0.25845 0.48729
3 0.60 -8.0000 2.1132 0.19989 0.68718
4 0.80 -8.0000 2.0895 0.13238 0.81956
5 1.00 -8.0000 2.0709 0.08044 0.90000
Done.
Even if the interim analyses actually performed during the study are not equally spaced, the power is not greatly affected. This can be seen in the following example. Recall our original plan had looks at 0.2, 0.4, 0.6, 0.8 and 1.0 and a target power of 90%. Suppose instead the looks occur at 0.2, 0.5, 0.6, 0.8, and 1.0. Option 3 generates appropriate boundaries and computes the power for a drift of 3.21. As shown, the power is not seriously affected.
PROGRAM PROMPTS USER INPUT
Is this an interactive session? (1=yes,0=no)
y
interactive = 1
Enter number for your option:
(1) Compute bounds for given spending function.
(2) Compute drift for given power and bounds
(3) Compute probabilities for given bounds.
(4) Compute confidence interval.
3
Option 3: You will be prompted for bounds or a spending
function to compute them.
Number of interim analyses?
5
5 interim analyses.
Equally spaced times between 0 and 1? (1=yes,0=no)
n
Times of interim analyses: (>0 & <=1)
.2 .5 .6 .8 1.0
Analysis times: 0.200 0.500 0.600 0.800 1.000
Are you using a spending function to determine bounds? (1=yes,0=no)
y
Spending function will determine bounds.
Overall significance level? (>0 and <=1)
.05
alpha = 0.050
One(1) or two(2)-sided symmetric?
1
1.-sided test
Use function? (1-5)
(1) OBrien-Fleming type
(2) Pocock type
(3) alpha * t
(4) alpha * t^1.5
(5) alpha * t^2
2
Use function alpha-star 2
Do you wish to truncate the standardized bounds? (1=yes, 0=no) n
Bounds will not be truncated.
Time Bounds
0.20 -8.0000 2.1762
0.50 -8.0000 2.0435
0.60 -8.0000 2.1609
0.80 -8.0000 2.0866
1.00 -8.0000 2.0680
Do you wish to use drift parameters? (1=yes, 0=no) y
How many drift parameters do you wish to enter?
1
1 drift parameters.
Enter drift parameters:
3.21
Drift parameters: 3.210
Drift is equal to the standard treatment difference times the square
root of total information per arm.
n = 5, drift = 3.2100
look time lower upper exit probability cum exit pr
1 0.20 -8.0000 2.1762 0.22945 0.22945
2 0.50 -8.0000 2.0435 0.38289 0.61234
3 0.60 -8.0000 2.1609 0.07757 0.68991
4 0.80 -8.0000 2.0866 0.13220 0.82211
5 1.00 -8.0000 2.0680 0.07941 0.90152
Done.
Referring to the previous survival example in Section 2.3, assume that three equally spaced analyses were initially planned for this study, and that test was to have 90% power. The following output from the program illustrates the Brownian motion drift parameter of 3.261 will give the desired power.
PROGRAM PROMPTS USER INPUT
Is this an interactive session? (1=yes,0=no)
y
interactive = 1
Enter number for your option:
(1) Compute bounds for given spending function.
(2) Compute drift for given power and bounds
(3) Compute probabilities for given bounds.
(4) Compute confidence interval.
2
Option 2: You will be prompted for bounds and a power level.
Number of interim analyses?
3
3 interim analyses.
Equally spaced times between 0 and 1? (1=yes,0=no)
y
Analysis times: 0.333 0.667 1.000
Are you using a spending function to determine bounds? (1=yes,0=no)
y
Spending function will determine bounds.
Overall significance level? (>0 and <=1)
.05
alpha = 0.050
One(1) or two(2)-sided symmetric?
2
2.-sided test
Use function? (1-5)
(1) OBrien-Fleming type
(2) Pocock type
(3) alpha * t
(4) alpha * t^1.5
(5) alpha * t^2
1
Use function alpha-star 1
Do you wish to truncate the standardized bounds? (1=yes, 0=no) n
Bounds will not be truncated.
Time Bounds
0.33 -3.7103 3.7103
0.67 -2.5114 2.5114
1.00 -1.9930 1.9930
Desired power? (>0 and <=1)
.90
Power is 0.900
n = 3, drift = 3.2608
look time lower upper exit probability cum exit pr
1 0.33 -3.7103 3.7103 0.03380 0.03380
2 0.67 -2.5114 2.5114 0.52651 0.56031
3 1.00 -1.9930 1.9930 0.33969 0.90000
Done.
This is an interactive session using the BHAT data and calendar time as the
only time scale. The input sequence is described in Section
.
PROGRAM PROMPTS USER INPUT
Is this an interactive session? (1=yes,0=no)
y
interactive = 1
Enter number for your option:
(1) Compute bounds for given spending function.
(2) Compute drift for given power and bounds
(3) Compute probabilities for given bounds.
(4) Compute confidence interval.
1
Option 1: You will be prompted for a spending function.
Number of interim analyses?
2
2 interim analyses.
Equally spaced times between 0 and 1? (1=yes,0=no)
n
Times of interim analyses: (>0 & <=1)
.2292 .3333
Analysis times: 0.229 0.333
Do you wish to specify a second time/information scale? (e.g.
number of patients or number of events, as in Lan & DeMets 89?) (1=yes, 0=no)
no
Overall significance level? (>0 and <=1)
.05
alpha = 0.050
One(1) or two(2)-sided symmetric?
2
2.-sided test
Use function? (1-5)
(1) OBrien-Fleming type
(2) Pocock type
(3) alpha * t
(4) alpha * t^1.5
(5) alpha * t^2
3
Use function alpha-star 3
Do you wish to truncate the standardized bounds? (1=yes, 0=no) n
Bounds will not be truncated.
This program generates two-sided symmetric boundaries.
n = 2
alpha = 0.050
use function for the lower boundary = 3
use function for the upper boundary = 3
Time Bounds alpha(i)-alpha(i-1) cum alpha
0.23 -2.5284 2.5284 0.01146 0.01146
0.33 -2.6098 2.6098 0.00520 0.01667
Do you want to see a graph? (1=yes,0=no)
n
Done.
In this case, the program outputs the number of analyses so far, the type I error specified, the use function chosen, the times, the computed boundaries, and the type I error ``spent'' at each analysis so far.
Some users may want to use the program noninteractively. This can be done by preparing an input file with the appropriate format. Each question is answered on its own line in the input file, and the answer to the first question must be ``no'' or ``0''. Here is an input file which reproduces the above interactive session:
0 # noninteractive 1 # option 1: bounds 2 # number of analyses 0 # equally spaced? (0=no) .2292 .3333 # times of analyses 0 # second time scale? (0=no) .05 # alpha 2 # 1 or 2 sided test 3 # use function (1-5) 0 # truncate boudaries (0=no) 0 # show graph? (0=no) 0 # start again? (0=no)The resulting output is
Is this an interactive session? (1=yes,0=no)
interactive = 0
2 interim analyses.
Analysis times: 0.229 0.333
alpha = 0.050
2.-sided test
Use function alpha-star 3
This program generates two-sided symmetric boundaries.
n = 2
alpha = 0.050
use function for the lower boundary = 3
use function for the upper boundary = 3
Time Bounds alpha(i)-alpha(i-1) cum alpha
0.23 -2.5284 2.5284 0.01146 0.01146
0.33 -2.6098 2.6098 0.00520 0.01667
Do you want to see a graph? (1=yes,0=no)
Done.
For this session, the numbers of events were entered as information, as described in Section 3.1.
PROGRAM PROMPTS USER INPUT
Is this an interactive session? (1=yes,0=no)
y
interactive = 1
Enter number for your option:
(1) Compute bounds for given spending function.
(2) Compute drift for given power and bounds
(3) Compute probabilities for given bounds.
(4) Compute confidence interval.
1
Option 1: You will be prompted for a spending function.
Number of interim analyses?
6
6 interim analyses.
Equally spaced times between 0 and 1? (1=yes,0=no)
n
Times of interim analyses: (>0 & <=1)
.2292 .3333 .4375 .5833 .7083 .8333
Analysis times: 0.229 0.333 0.438 0.583 0.708 0.833
Do you wish to specify a second time/information scale? (e.g.
number of patients or number of events, as in Lan & DeMets 89?) (1=yes, 0=no)
y
Second scale will estimate covariances.
Information:
56 77 126 177 247 318
Information 56.000 77.000 126.000 177.000 247.000 318.000
Overall significance level? (>0 and <=1)
.05
alpha = 0.050
One(1) or two(2)-sided symmetric?
2
2.-sided test
Use function? (1-5)
(1) OBrien-Fleming type
(2) Pocock type
(3) alpha * t
(4) alpha * t^1.5
(5) alpha * t^2
3
Use function alpha-star 3
Do you wish to truncate the standardized bounds? (1=yes, 0=no) n
Bounds will not be truncated.
This program generates two-sided symmetric boundaries.
n = 6
alpha = 0.050
use function for the lower boundary = 3
use function for the upper boundary = 3
Time Information Bounds alpha(i)-alpha(i-1) cum alpha
0.23 56.00 -2.5284 2.5284 0.01146 0.01146
0.33 77.00 -2.5905 2.5905 0.00520 0.01667
0.44 126.00 -2.6327 2.6327 0.00521 0.02187
0.58 177.00 -2.5036 2.5036 0.00729 0.02916
0.71 247.00 -2.5073 2.5073 0.00625 0.03542
0.83 318.00 -2.4655 2.4655 0.00625 0.04166
Do you want to see a graph? (1=yes,0=no)
n
Done.
In addition to the output described previously, the information is also
reported.
In addition to the information needed to compute probabilities associated with a set of boundaries, computing a confidence interval also requires the last value of the standardized test statistic.
PROGRAM PROMPTS USER INPUT
Is this an interactive session? (1=yes,0=no)
y
interactive = 1
Enter number for your option:
(1) Compute bounds for given spending function.
(2) Compute drift for given power and bounds
(3) Compute probabilities for given bounds.
(4) Compute confidence interval.
4
Option 4: You will be prompted for bounds and a confidence level.
Number of interim analyses?
6
6 interim analyses.
Equally spaced times between 0 and 1? (1=yes,0=no)
n
Times of interim analyses: (>0 & <=1)
.2292 .3333 .4375 .5833 .7083 .8333
Analysis times: 0.229 0.333 0.438 0.583 0.708 0.833
Are you using a spending function to determine bounds? (1=yes,0=no)
no
You must enter a set of bounds.
One(1)- or two(2)-sided?
2
2-sided test
Symmetric bounds? (1=yes,0=no)
y
Two sided symmetric bounds.
Enter upper bounds (standardized):
2.53 2.61 2.57 2.47 2.43 2.38
Bounds entered.
Time Bounds
0.23 -2.5300 2.5300
0.33 -2.6100 2.6100
0.44 -2.5700 2.5700
0.58 -2.4700 2.4700
0.71 -2.4300 2.4300
0.83 -2.3800 2.3800
Enter the standardized statistic at the last analysis:
2.82
Last value: 2.8200
Enter confidence level (>0 and <1):
.95
95. percent confidence interval
Starting computation for lower limit . . .
Lower limit computed, starting on upper limit . . .
95. percent confidence interval: ( 0.1881, 4.9347)
Drift is equal to the standard treatment difference times the square
root of total information per arm.
Done.
Translation of the standardized parameter back to an estimate of the
difference between treatment groups is done in Section 3.2
Acknowledgements
The authors wish to acknowledge of Kris Erlandson and Bill Ladd for assistance in constructing examples, and Wen Wei for assistance in programming.
Armitage, P., McPherson, C. K. & Rowe, B. C. (1969), `Repeated significance tests on accumulating data', Journal of the Royal Statistical Society, Series A 132, 235-244.
Beta-Blocker Heart Attack Trial Research Group (1982), `A randomized trial of propranolol in patients with acute myocardial infarction. I, Mortality results.', Journal of the American Medical Association 246, 1707-1714.
DeMets, D. L., & Lan, K. K. G. (1984), `An overview of sequential methods and their applications in clinical trials', Communications in Statistics, Theory and Methods, 13, 2315-2338.
Hwang, I. K., Shih, W. J. & deCani, J. S. (1990), `Group sequential designs using a family of type I error probability spending functions', Statistics in Medicine, 9, 1439-1445.
Kim, K. & DeMets, D. L. (1987a), `Design and analysis of group sequential tests based on the type I error spending rate function', Biometrika 74, 149-154.
Kim, K. & DeMets, D. L. (1987b), `Confidence intervals following group sequential tests in clinical trials', Biometrics 43, 857-864.
Kim, K. & DeMets, D. L. (1992), `Sample size determination for group sequential clinical trials with immediate response', Statistics in Medicine 11, 1391-1399.
Lan, K. K. G. & DeMets, D. L. (1983), `Discrete sequential boundaries for clinical trials', Biometrika 70, 659-663.
Lan, K. K. G. & DeMets, D. L. (1989), `Group sequential procedures: calendar versus information time', Statistics in Medicine 8, 1191-1198.
Lan, K. K. G. and Zucker, D. M., (1993) `Sequential monitoring of clinical trials: the role of information and Brownian motion', Statistics in Medicine 12, 753-765.
Lan, K. K. G., Reboussin, D. M. & DeMets, D. L. (1994), `Information and information fractions for design and sequential monitoring of clinical trials', Communications in Statistics, Part A--Theory and Methods 23, 403-420.
McPherson, C. K. & Armitage, P. (1971), `Repeated significance tests on accumulating data when the null hypothesis is not true', Journal of the Royal Statistical Society, Series A 134, 15-25.
O'Brien, P. C. & Fleming, T. R. (1979), `A multiple testing procedure for clinical trials', Biometrics 35, 549-556.
Pocock, S. J. (1977), `Group sequential methods in the design and analysis of clinical trials', Biometrika 64, 191-199.
Reboussin92a Reboussin, D. M., DeMets, D. L., Kim, K. & Lan, K. K. G. (1992), Programs for computing group sequential boundaries using the Lan-DeMets method, Technical Report 60, Department of Biostatistics, University of Wisconsin-Madison.
Reboussin, D. M., Lan, K. K. G. & DeMets, D. L. (1992). Group sequential testing of longitudinal data. Technical Report 72, Department of Biostatistics, University of Wisconsin-Madison.
Wu, M. C. & Lan, K. K. G., (1992), `Sequential monitoring for comparison of changes in a response variable in clinical studies', Biometrics 48, 765-779.
Theory related to the computations.
Consider a Brownian motion process in continuous time, W(t),
,
having unknown drift parameter
, which may be inspected at times
. We wish to test the hypothesis
at each inspection time
and proceed only if the test
fails to reject; that is, if
does not exceed some value, so that
the sequential test rejects if
.
Consider a sequence of boundaries,
applied at
times
. Let g denote the standard normal density
function,
The probability distribution for W at analysis i is determined
recursively by
and
where
is the variance of
, that is,
. Integrating
from
to
gives the probability that the trial continues past the
analysis.
Computations at the first analysis involve only the standard normal density and distribution function, but for the second and beyond, numerical integration is necessary. By applying Fubini's theorem, we have the continuation probability at analysis i
Note that only a single numerical integration is now required. This
manipulation allows the use of simple, accurate approximations to the
normal distribution function to be used for computing
. Extension of
the above to two sided tests is straightforward: if
is the lower
bound, it can be substituted for
in the above integrals.
Description of computations.
For the first analysis, which uses only the cumulative normal distribution,
we have
. The probability
calculated for exceeding the first upper boundary is
In the programs, given
, separate subroutines are
called to compute the exit probability, denoted
and, if there are
more analyses to come, to compute
. For the routine computing
,
a grid of values of
for
, saved from
the previous step, is needed. The grid size is standardized, so that it is
finer when the increment has a smaller standard deviation. At each grid
point u, the quantity
is computed and stored in an array. This array is then passed to a
numerical integration routine along with
and the grid
size, and
is returned. The other
subroutine computes
for a grid of values between
and
.
For each grid point, the grid of values of
is needed. Letting
u denote a point in the grid from
to
and x denote
a point in the grid from
to
, the quantity
is computed and stored in an array. As before, this array is passed to a
numerical integration routine, along with
and the grid
size, and
is obtained and stored for the next step. Currently,
the numerical integration routine is a composite trapezoidal rule, which
appears to produce fairly accurate results. Reboussin, DeMets, Kim & Lan
(1992) present testing of the programs for computational accuracy and
simulations results for validity. Their appendices contain listings of the
code.
Programming for spending functions.
Boundaries and information fractions are related by the type I error spending function. The program contains five choices for these functions in a single subroutine called alphas. The critical source code is:
c Calculate probabilities according to use function.
do 50 i=1,nn
if (iuse .eq. 1) then
pe(i)=2.d0*
. (1.d0-pnorm(znorm(1.d0-(alpha/side)/2.d0)/dsqrt(t(i))))
else if (iuse .eq. 2) then
pe(i)=(alpha/side)*dlog(1.d0 + (e-1.d0)*t(i))
else if (iuse .eq. 3) then
pe(i)=(alpha/side)*t(i)
else if (iuse .eq. 4) then
pe(i)=(alpha/side)*(t(i) ** 1.5d0)
else if (iuse .eq. 5) then
pe(i)=(alpha/side)*(t(i) ** 2.0d0)
c Add other spending function options here: e.g.
c else if (iuse.eq.6) then . . .
else
write(6,*) ' Warning: invalid use function.'
end if
Additional spending functions can be added as ``silent'' options by editing this section of code. For example, here is the code for a spending function which does not allow stopping until the trial is half over. Once half the information has accumulated, the type I error is spent uniformly until the end of the trial.
else if (iuse .eq. 6) then
if (t(i).le.0.0) then
pe(i)=0.0d0
else
pe(i)=(alpha/side)*(t(i) * 2.0d0 - 1.d0)
end if
This could also be added to the input routine with some additional
programming effort.