This vignette uses the current package implementation in R/01-validation.R through R/05-user-interface.R. The archived code in R/old/ is not part of this workflow.

The current user-facing path is:

  1. Put the outcome, mixture history, and covariate history into matrices.
  2. Convert those matrices with prepare_gbkmr_data().
  3. Check the detected variable structure.
  4. Run gbkmr_run().

Input Contract

For T time points, p mixture components, Ldim time-varying covariates, and B baseline covariates, the current API expects:

Object Shape Ordering
Y n vector final outcome
Z n x (p * T) matrix M1_t0, M2_t0, ..., M1_t1, ...
X n x (Ldim * T + B) matrix all time-varying covariate columns first, then baseline columns

prepare_gbkmr_data() writes mixture columns as logM1_0, logM2_0, …, and time-varying covariates as <name>_0, <name>_1, …. The first baseline column is named sex; additional baseline columns are named baseline_2, baseline_3, ….

Simulate Data

library(causalBKMR)

set.seed(123)
n <- 300
T_points <- 3
p <- 2
Ldim <- 1
B <- 1

sex <- rbinom(n, 1, 0.5)
waist0 <- rnorm(n, 85, 12)

Z <- matrix(rlnorm(n * p * T_points, meanlog = 1, sdlog = 0.5),
            nrow = n, ncol = p * T_points)

waist_t0 <- waist0
waist_t1 <- waist0 + 0.4 * log(Z[, 1]) + rnorm(n, 0, 2)
waist_t2 <- waist_t1 + 0.3 * log(Z[, 3]) + rnorm(n, 0, 2)
X <- cbind(waist_t0, waist_t1, waist_t2, sex)

Y <- 50 +
  0.5 * sex +
  0.15 * waist_t2 +
  0.8 * log(Z[, 5]) -
  0.4 * log(Z[, 6]) +
  rnorm(n, 0, 4)

Prepare Data

prepared <- prepare_gbkmr_data(
  Y = Y,
  Z = Z,
  X = X,
  time_points = T_points,
  mixture_components = p,
  td_covariates = Ldim,
  baseline_covariates = B,
  td_covariate_names = "waist",
  log_transform_mixtures = TRUE
)

names(prepared)
head(prepared)

Expected columns include:

sex
waist_0
logM1_0 logM2_0 logM1_1 logM2_1 logM1_2 logM2_2
waist_1 waist_2
Y
id

Check Detection

detection <- detect_variable_patterns(prepared, T = T_points)
print(detection)

This check matters because gbkmr_run() uses the detected structure to build models for time-varying confounders and outcomes. If p, Ldim, or the covariate names are wrong, fix the matrices before fitting.

Run Analysis

For real analyses use many more iterations. The small values below are only a smoke-test template.

fit <- gbkmr_run(
  data = prepared,
  outcome = "Y",
  outcome_type = "continuous",
  time_points = T_points,
  engine = "auto",
  n = nrow(prepared),
  iter = 15000,
  K = 1000,
  n_knots = 50
)

engine = "auto" selects standard BKMR for n <= 2000, binary outcomes, or binary time-varying confounders. For larger all-Gaussian analyses it uses fastbkmr only if the fbkmr package is installed.

Read Results

print(fit)
summary(fit)

fit$causal_effect
fit$counterfactual_means
fit$detection_info
fit$diagnostics

The default estimand is the average causal effect of shifting every mixture component at every time point from its 25th percentile to its 75th percentile:

ACE = E[Y(a*)] - E[Y(a)]

Only mixture columns are intervened on. Time-varying confounders are sampled forward through fitted time-varying confounder models, and baseline covariates enter the BKMR models linearly.