Introduction
Studies designs for estimating causal effects are numerous. Based on the design, it is often necessary to control or address several sources of bias, such as baseline and time-varying confounding, informative censoring, selection bias, and a whole host of others. Designs like the treatment decision design [1], new user design [2], and prevalent new user design [3] each address these biases in different ways and require seemingly different analytic approaches to yield unbiased estimates from their resulting data.
Recently, the ‘clone-censor-weight’ approach [4–6] has become a popular way to estimate the effects of sustained or dynamic treatment regimens. However, this approach, and the way of thinking it entails (which involves conceptualizing a ‘target trial’ and adapting it to the observational setting [7]), is more general, and nearly all studies can be thought of in this way. Here, we show that a standard study of a point treatment can be thought of as a clone-censor-weight design, and we show how confounding and informative censoring can be addressed with a single nuisance model.
The Setup
Consider a study of a binary baseline treatment, \(A\), on a time-to-event, \(T\). Patients may be censored prior to experiencing the event, and the time of censoring is \(C\). A patient’s observed follow-up time is \(\tilde{T}=min(T,C)\). In addition, a set of baseline covariates sufficient to control for confounding and informative censoring are collected, denoted \(W\). Finally, we define \(\Delta=C>\tilde{T}\), which is an indicator that a patient was not censored at their observed follow-up time (and therefore had the event). A subject’s observed data therefore consist of \(\{A, \tilde{T}, W, \Delta\}\).
One estimator for the counterfactual cumulative incidence of the outcome under treatment level \(A=a\) is [8]:
\[ \hat{Pr}(T(a)<t)=\frac{1}{n}\sum_{i=1}^n{\frac{\Delta_iI(\tilde{t}_i<t)I(A_i=a)}{\hat{Pr}(\Delta=1|W_i,A_i,T_i)\hat{Pr}(A=a|W_i)}}, \]
where \(T(a)\) is the time of the event had, possibly counter to fact, a subject received treatment level \(A=a\), \(n\) is the total population size, and each of the probabilities in the denominator are modeled appropriately, e.g., with a Cox proportional hazards model for the censoring model and logistic regression for the treatment model.
Data Generation
Here, we generate a simple dataset for demonstration.
expit <- function(p){
exp(p)/(1+exp(p))
}
n <- 10000
dat <- tibble(
id = 1:n,
W = runif(n),
A = rbinom(n, 1, expit(W)),
T0 = rexp(n, rate = 0.5 + 2*W),
T1 = rexp(n, rate = 1 + 2*W),
T = A*T1 + (1-A)*T0,
C = rexp(n, rate = .5 + .55*A + .5*W)
)
Note that our true causal risk difference is 11.93%.
Typical Study Design and Analysis
Using the causalRisk
package, we can easily implement
the estimator described above to get the unadjusted and adjusted
cumulative incidence curves:
mod_unadj <- specify_models(identify_treatment(A),
identify_outcome(T),
identify_censoring(C))
mod_adj <- specify_models(identify_treatment(A, ~W),
identify_outcome(T),
identify_censoring(C, ~W))
fit_unadj <- estimate_ipwrisk(dat, mod_unadj, times = seq(0, 0.5, by = 0.01), labels = "Unadjusted, Standard")
fit_adj <- estimate_ipwrisk(dat, mod_adj, times = seq(0, 0.5, by = 0.01), labels = "Adjusted, Standard")
make_table1(fit_adj, side.by.side = T)
plot(fit_unadj, fit_adj)