This vignette uses the current package implementation in
R/01-validation.R through
R/05-user-interface.R. The archived code in
R/old/ is not part of this workflow.
The current user-facing path is:
prepare_gbkmr_data().gbkmr_run().For T time points, p mixture components,
Ldim time-varying covariates, and B baseline
covariates, the current API expects:
| Object | Shape | Ordering |
|---|---|---|
Y |
n vector |
final outcome |
Z |
n x (p * T) matrix |
M1_t0, M2_t0, ..., M1_t1, ... |
X |
n x (Ldim * T + B) matrix |
all time-varying covariate columns first, then baseline columns |
prepare_gbkmr_data() writes mixture columns as
logM1_0, logM2_0, …, and time-varying
covariates as <name>_0, <name>_1,
…. The first baseline column is named sex; additional
baseline columns are named baseline_2,
baseline_3, ….
library(causalBKMR)
set.seed(123)
n <- 300
T_points <- 3
p <- 2
Ldim <- 1
B <- 1
sex <- rbinom(n, 1, 0.5)
waist0 <- rnorm(n, 85, 12)
Z <- matrix(rlnorm(n * p * T_points, meanlog = 1, sdlog = 0.5),
nrow = n, ncol = p * T_points)
waist_t0 <- waist0
waist_t1 <- waist0 + 0.4 * log(Z[, 1]) + rnorm(n, 0, 2)
waist_t2 <- waist_t1 + 0.3 * log(Z[, 3]) + rnorm(n, 0, 2)
X <- cbind(waist_t0, waist_t1, waist_t2, sex)
Y <- 50 +
0.5 * sex +
0.15 * waist_t2 +
0.8 * log(Z[, 5]) -
0.4 * log(Z[, 6]) +
rnorm(n, 0, 4)
prepared <- prepare_gbkmr_data(
Y = Y,
Z = Z,
X = X,
time_points = T_points,
mixture_components = p,
td_covariates = Ldim,
baseline_covariates = B,
td_covariate_names = "waist",
log_transform_mixtures = TRUE
)
names(prepared)
head(prepared)Expected columns include:
sex
waist_0
logM1_0 logM2_0 logM1_1 logM2_1 logM1_2 logM2_2
waist_1 waist_2
Y
id
detection <- detect_variable_patterns(prepared, T = T_points)
print(detection)This check matters because gbkmr_run() uses the detected
structure to build models for time-varying confounders and outcomes. If
p, Ldim, or the covariate names are wrong, fix
the matrices before fitting.
For real analyses use many more iterations. The small values below are only a smoke-test template.
fit <- gbkmr_run(
data = prepared,
outcome = "Y",
outcome_type = "continuous",
time_points = T_points,
engine = "auto",
n = nrow(prepared),
iter = 15000,
K = 1000,
n_knots = 50
)engine = "auto" selects standard BKMR for
n <= 2000, binary outcomes, or binary time-varying
confounders. For larger all-Gaussian analyses it uses
fastbkmr only if the fbkmr package is
installed.
print(fit)
summary(fit)
fit$causal_effect
fit$counterfactual_means
fit$detection_info
fit$diagnosticsThe default estimand is the average causal effect of shifting every mixture component at every time point from its 25th percentile to its 75th percentile:
ACE = E[Y(a*)] - E[Y(a)]
Only mixture columns are intervened on. Time-varying confounders are sampled forward through fitted time-varying confounder models, and baseline covariates enter the BKMR models linearly.