Simulate binary data from a platform trial with a shared control arm and a given number of experimental treatment arms entering at given time points

This function simulates data from a platform trial with a given number of experimental treatment arms entering at given time points and a shared control arm. The primary endpoint is a binary endpoint. The user specifies the timing of adding arms in terms of patients recruited to the trial so far and the sample size per experimental treatment arm.

Usage

datasim_bin(
  num_arms,
  n_arm,
  d,
  period_blocks = 2,
  p0,
  OR,
  lambda,
  trend,
  N_peak,
  n_wave,
  trend_mean = 0,
  trend_var = 0.5,
  full = FALSE,
  check = TRUE
)

Arguments

num_arms: Integer. Number of experimental treatment arms in the trial.
n_arm: Integer. Sample size per experimental treatment arm (assumed equal).
d: Integer vector with timings of adding new arms in terms of number of patients recruited to the trial so far. The first entry must be 0, so that the trial starts with at least one experimental treatment arm, and the entries must be non-decreasing. The vector length equals num_arms.
period_blocks: Integer. Number to define the size of the blocks for the block randomization. The block size in each period equals period_blockstimes the number of active arms in the period (see Details). Default=2.
p0: Double. Response probability in the control arm.
OR: Double vector with treatment effects in terms of odds ratios for each experimental treatment arm compared to control. The elements of the vector (odds ratios) are ordered by the entry time of the experimental treatment arms (e.g., the first entry in the vector corresponds to the odds ratio of the first experimental treatment arm). The vector length equals num_arms.
lambda: Vector containing numerical entries or the string "random", indicating the strength of the time trend in each arm ordered by the entry time of the arms (e.g., the first entry in the vector corresponds to the time trend in the control arm, second entry to the time trend in the first experimental treatment arm). The vector length equals num_arms+1, as time trend in the control is also allowed. In case of random time trend, its strenght is generated from a normal distribution.
trend: String indicating the time trend pattern ("linear", "linear_2, "stepwise", "stepwise_2", "inv_u" or "seasonal"). See Details for more information.
N_peak: Integer. Timepoint at which the inverted-u time trend switches direction in terms of overall sample size (i.e. after how many recruited participants the trend direction switches).
n_wave: Integer. Number of cycles (waves) should the seasonal trend have.
trend_mean: Integer. In case of random time trends, the strength of the time trend will be generated from N(trend_mean, trend_var). Default: N(0, 0.5).
trend_var: Integer. In case of random time trends, the strength of the time trend will be generated from N(trend_mean, trend_var). Default: N(0, 0.5).
full: Logical. Indicates whether the output should be in form of a data frame with variables needed for the analysis only (FALSE) or in form of a list containing more information (TRUE). Default=FALSE.
check: Logical. Indicates whether the input parameters should be checked by the function. Default=TRUE, unless the function is called by a simulation function, where the default is FALSE.

Value

Data frame: simulated trial data (if full=FALSE, i.e. default) with the following columns:

j - patient recruitment index
response - binary response for patient j
treatment- index of the treatment patient j was allocated to
period - index of the period patient j was recruited in

or List (if full=TRUE) containing the following elements:

Data - simulated trial data, including an additional column p with the probability used for simulating the response for patient j
n_total - total sample size in the trial
n_arm - sample size per arm (assumed equal)
num_arms - number of experimental treatment arms in the trial
d - timings of adding new arms
SS_matrix - matrix with the sample sizes per arm and per period
period_blocks - number to multiply the number of active arms with, in order to get the block size per period
p0 - response probability in the control arm
OR - odds ratios for each experimental treatment arm
lambda - strength of time trend in each arm
time_dep_effect - time dependent treatment effects for each experimental treatment arm (for computing the bias)
trend - time trend pattern

Details

Design assumptions:

The simulated platform trial consists of a given number of experimental treatment arms (specified by the argument num_arms) and one control arm that is shared across the whole platform.
Participants are indexed by entry order, assuming that at each time unit exactly one participant is recruited and the time of recruitment and observation of the response are equal.
All participants are assumed to be eligible for all arms in the trial, i.e. the same inclusion and exclusion criteria apply to all experimental and control arms.
Equal sample sizes (given by parameter n_arm) in all experimental treatment arms are assumed.
The duration of the trial is divided into so-called periods, defined as time intervals bounded by distinct time points of any treatment arm entering or leaving the platform. Hence, multiple treatment arms entering or leaving at the same time point imply the start of only one additional period.
Allocation ratio of 1:1:...:1 in each period. Furthermore, block randomization is used to assign patients to the active arms. Block size in each period = period_blocks* (number of active arms in the period).
If the period sample size is not a multiple of the block size, arms for the remaining participants are chosen by sampling without replacement from a vector containing the indices of active arms replicated times ceiling(remaining sample size/number of active arms).

Data generation:

The binary response $y_j$ for patient $j$ is generated according to:

$$g(E(y_j)) = \eta_0 + \sum_{k=1}^K \cdot I(k_j=k) + f(j)$$

where $g(\cdot)$ is the logit link function, and $\eta_0$ (logit function of parameter p0) and $\theta_k$ (log of the parameter OR) are the log odds in the control arm and the log odds ratio of treatment $k$. $K$ is the total number of treatment arms in the trial (parameter num_arms) and $k_j$ is an indicator of the treatment arm patient $j$ is allocated to.

The function $f(j)$ denotes the time trend, whose strength is indicated by $\lambda_{k_j}$ (parameter lambda) and which can have the following patterns (parameter trend):

"linear" - trend starts at the beginning of the trial and the log odds increases or decreases linearly with a slope of $\lambda$, according to the function $f(j) = \lambda \cdot \frac{j-1}{N-1}$, where $N$ is the total sample size in the trial
"linear_2" - trend starts after the first period (i.e. there is no time trend in the first period) and the log odds increases or decreases linearly with a slope of $\lambda$, according to the function $f(j) = \lambda \cdot \frac{j-1}{N-1}$, where $N$ is the total sample size in the trial
"stepwise" - the log odds is constant in each period and increases or decreases by $\lambda$ each time any treatment arm enters or leaves the trial (i.e. in each period), according to the function $f(j) = \lambda_{k_j} \cdot (c_j - 1)$, where $c_j$ is an index of the period patient $j$ was enrolled in
"stepwise_2" - the log odds is constant in each period and increases or decreases by $\lambda$ each time a new treatment arm is added to the trial, according to the function $f(j) = \lambda_{k_j} \cdot (w_j - 1)$, where $w_j$ is an indicator of how many treatment arms have already entered the ongoing trial, when patient $j$ was enrolled
"inv_u" - the log odds increases up to the point $N_p$ (parameter N_peak) and decreases afterwards, linearly with a slope of $\lambda$, according to the function $f(j) = \lambda \cdot \frac{j-1}{N-1} (I(j \leq N_p) - I(j > N_p))$, where $N_p$ indicates the point at which the trend turns from positive to negative in terms of the sample size (note that for negative $\lambda$, the log odds ratio decreases first and increases after)
"seasonal" - the log odds increases and decreases periodically with a magnitude of $\lambda$, according to the function $f(j) = \lambda \cdot \mathrm{sin} \big( \psi \cdot 2\pi \cdot \frac{j-1}{N-1} \big)$, where $\psi$ indicates how many cycles should the time trend have (parameter n_wave)

Trials with no time trend can be simulated too, by setting all elements of the vector lambda to zero and choosing an arbitrary pattern.

Author

Pavla Krotka, Marta Bofill Roig

Examples


head(datasim_bin(num_arms = 3, n_arm = 100, d = c(0, 100, 250),
p0 = 0.7, OR = rep(1.8, 3), lambda = rep(0.15, 4), trend="stepwise"))
#>   j response treatment period
#> 1 1        1         1      1
#> 2 2        1         0      1
#> 3 3        0         1      1
#> 4 4        1         0      1
#> 5 5        0         1      1
#> 6 6        1         0      1