Studies designs for estimating causal effects are numerous. Based on the design, it is often necessary to control or address several sources of bias, such as baseline and time-varying confounding, informative censoring, selection bias, and a whole host of others. Designs like the treatment decision design [1], new user design [2], and prevalent new user design [3] each address these biases in different ways and require seemingly different analytic approaches to yield unbiased estimates from their resulting data.

Recently, the ‘clone-censor-weight’ approach [4–6] has become a popular way to estimate the effects of sustained or dynamic treatment regimens. However, this approach, and the way of thinking it entails (which involves conceptualizing a ‘target trial’ and adapting it to the observational setting [7]), is more general, and nearly all studies can be thought of in this way. Here, we show that a standard study of a point treatment can be thought of as a clone-censor-weight design, and we show how confounding and informative censoring can be addressed with a single nuisance model.

The Setup

Consider a study of a binary baseline treatment, \(A\), on a time-to-event, \(T\). Patients may be censored prior to experiencing the event, and the time of censoring is \(C\). A patient’s observed follow-up time is \(\tilde{T}=min(T,C)\). In addition, a set of baseline covariates sufficient to control for confounding and informative censoring are collected, denoted \(W\). Finally, we define \(\Delta=C>\tilde{T}\), which is an indicator that a patient was not censored at their observed follow-up time (and therefore had the event). A subject’s observed data therefore consist of \(\{A, \tilde{T}, W, \Delta\}\).

One estimator for the counterfactual cumulative incidence of the outcome under treatment level \(A=a\) is [8]:

\[ \hat{Pr}(T(a)<t)=\frac{1}{n}\sum_{i=1}^n{\frac{\Delta_iI(\tilde{T}_i<t)I(A_i=a)}{\hat{Pr}(\Delta=1|W_i,A_i,T_i)\hat{Pr}(A=a|W_i)}}, \]

where \(T(a)\) is the time of the event had, possibly counter to fact, a subject received treatment level \(A=a\), \(n\) is the total population size, and each of the probabilities in the denominator are modeled appropriately, e.g., with a Cox proportional hazards model for the censoring model and logistic regression for the treatment model.

Data Generation

Here, we generate a simple dataset for demonstration.

expit <- function(p){

n <- 10000

dat <- tibble(
  id = 1:n,
  W = runif(n),
  A = rbinom(n, 1, expit(W)),
  T0 = rexp(n, rate = 0.5 + 2*W),
  T1 = rexp(n, rate = 1 + 2*W),
  T = A*T1 + (1-A)*T0,
  C = rexp(n, rate = .5 + .55*A + .5*W)

Note that our true causal risk difference is 10.55%.

Typical Study Design and Analysis

Using the causalRisk package, we can easily implement the estimator described above to get the unadjusted and adjusted cumulative incidence curves:

mod_unadj <- specify_models(identify_treatment(A),

mod_adj <- specify_models(identify_treatment(A, ~W),
                          identify_censoring(C, ~W))

fit_unadj <- estimate_ipwrisk(dat, mod_unadj, times = seq(0, 0.5, by = 0.01), labels = "Unadjusted, Standard")

fit_adj <- estimate_ipwrisk(dat, mod_adj, times = seq(0, 0.5, by = 0.01), labels = "Adjusted, Standard")

make_table1(fit_adj, = T)
plot(fit_unadj, fit_adj)