index.knit

Introduction

Studies designs for estimating causal effects are numerous. Based on the design, it is often necessary to control or address several sources of bias, such as baseline and time-varying confounding, informative censoring, selection bias, and a whole host of others. Designs like the treatment decision design [1], new user design [2], and prevalent new user design [3] each address these biases in different ways and require seemingly different analytic approaches to yield unbiased estimates from their resulting data.

Recently, the ‘clone-censor-weight’ approach [4–6] has become a popular way to estimate the effects of sustained or dynamic treatment regimens. However, this approach, and the way of thinking it entails (which involves conceptualizing a ‘target trial’ and adapting it to the observational setting [7]), is more general, and nearly all studies can be thought of in this way. Here, we show that a standard study of a point treatment can be thought of as a clone-censor-weight design, and we show how confounding and informative censoring can be addressed with a single nuisance model.

The Setup

Consider a study of a binary baseline treatment, \(A\), on a time-to-event, \(T\). Patients may be censored prior to experiencing the event, and the time of censoring is \(C\). A patient’s observed follow-up time is \(\tilde{T}=min(T,C)\). In addition, a set of baseline covariates sufficient to control for confounding and informative censoring are collected, denoted \(W\). Finally, we define \(\Delta=C>\tilde{T}\), which is an indicator that a patient was not censored at their observed follow-up time (and therefore had the event). A subject’s observed data therefore consist of \(\{A, \tilde{T}, W, \Delta\}\).

One estimator for the counterfactual cumulative incidence of the outcome under treatment level \(A=a\) is [8]:

\[ \hat{Pr}(T(a)<t)=\frac{1}{n}\sum_{i=1}^n{\frac{\Delta_iI(\tilde{T}_i<t)I(A_i=a)}{\hat{Pr}(\Delta=1|W_i,A_i,T_i)\hat{Pr}(A=a|W_i)}}, \]

where \(T(a)\) is the time of the event had, possibly counter to fact, a subject received treatment level \(A=a\), \(n\) is the total population size, and each of the probabilities in the denominator are modeled appropriately, e.g., with a Cox proportional hazards model for the censoring model and logistic regression for the treatment model.

Data Generation

Here, we generate a simple dataset for demonstration.

expit <- function(p){
  exp(p)/(1+exp(p))
}

n <- 10000

dat <- tibble(
  id = 1:n,
  W = runif(n),
  A = rbinom(n, 1, expit(W)),
  T0 = rexp(n, rate = 0.5 + 2*W),
  T1 = rexp(n, rate = 1 + 2*W),
  T = A*T1 + (1-A)*T0,
  C = rexp(n, rate = .5 + .55*A + .5*W)
)

Note that our true causal risk difference is 10.55%.

Typical Study Design and Analysis

Using the causalRisk package, we can easily implement the estimator described above to get the unadjusted and adjusted cumulative incidence curves:

mod_unadj <- specify_models(identify_treatment(A),
                            identify_outcome(T),
                            identify_censoring(C))

mod_adj <- specify_models(identify_treatment(A, ~W),
                          identify_outcome(T),
                          identify_censoring(C, ~W))

fit_unadj <- estimate_ipwrisk(dat, mod_unadj, times = seq(0, 0.5, by = 0.01), labels = "Unadjusted, Standard")

fit_adj <- estimate_ipwrisk(dat, mod_adj, times = seq(0, 0.5, by = 0.01), labels = "Adjusted, Standard")

make_table1(fit_adj, side.by.side = T)
plot(fit_unadj, fit_adj)