This vignette describes the methodology implemented by the current causalBKMR package. The deployed API lives in the active R/ files:

Archived code under R/old/ is not used in the vignettes or the current recommended workflow.

Problem Setup

We observe n subjects over T time points. At each time point, each subject has:

  • p mixture exposure components.
  • Ldim time-varying covariates.
  • optional baseline covariates.
  • an outcome Y measured after the exposure and covariate history.

The user supplies three objects:

Object Shape Meaning
Y length n Continuous or binary outcome
Z n x (p * T) Mixture exposures ordered by time
X n x (Ldim * T + B) Time-varying covariates first, baseline covariates last

prepare_gbkmr_data() converts those objects into a wide data frame with columns such as logM1_0, logM2_0, waist_0, logM1_1, and Y.

prepared <- prepare_gbkmr_data(
  Y = Y,
  Z = Z,
  X = X,
  time_points = 3,
  mixture_components = 2,
  td_covariates = 1,
  baseline_covariates = 1,
  td_covariate_names = "waist"
)

detect_variable_patterns(prepared, T = 3)

Causal Estimand

The target estimand is the average causal effect comparing two full mixture trajectories:

ACE=E[Y(a*)]E[Y(a)]. ACE = E[Y(a^*)] - E[Y(a)].

By default, gbkmr_run() sets a to the 25th percentile and a* to the 75th percentile for every mixture component at every time point:

fit <- gbkmr_run(
  data = prepared,
  time_points = 3,
  a_probs = c(0.25, 0.75)
)

Users can supply explicit trajectories with a_vals and astar_vals. Those vectors must be named with the mixture column names in the prepared data.

exposure_names <- grep("^logM\\d+_\\d+$", names(prepared), value = TRUE)

a_vals <- setNames(rep(0.5, length(exposure_names)), exposure_names)
astar_vals <- setNames(rep(1.5, length(exposure_names)), exposure_names)

fit_custom <- gbkmr_run(
  data = prepared,
  time_points = 3,
  a_vals = a_vals,
  astar_vals = astar_vals
)

Model Sequence

The current gbkmr_run() implementation follows this sequence:

  1. Detect mixture, time-varying, baseline, and outcome columns.
  2. Fit mediator models for time-varying covariates after baseline.
  3. Fit the outcome model.
  4. Simulate counterfactual covariate histories under a and a*.
  5. Predict counterfactual outcomes.
  6. Return posterior summaries of Y(a*) - Y(a).

For continuous outcomes, the result is on the outcome scale. For binary outcomes, the result is a risk difference.

BKMR Engines

gbkmr_run() can use two engines:

Engine Function Use case
bkmr bkmr::kmbayes() Default engine; supports continuous and binary outcomes
fastbkmr fbkmr::skmbayes() Optional large-sample engine for continuous outcomes

With engine = "auto", the package uses fastbkmr only for large continuous-outcome analyses when fbkmr is installed. Otherwise it uses standard bkmr.

Version 1 support note:

Outcome Time-dependent confounder L engine = "bkmr" engine = "fastbkmr"
continuous all continuous Supported: Gaussian models throughout Supported: Gaussian models throughout
binary all continuous Supported: probit outcome model Not supported; hard error
continuous any binary Runs with warning: binary L is fit and sampled as Gaussian Runs with warning: binary L is fit and sampled as Gaussian
binary any binary Runs with warning: probit outcome plus Gaussian L models Not supported; hard error

In this table, binary means a 0/1 outcome or a 0/1 time-dependent confounder. Mixture exposures can remain continuous. The current public fbkmr::skmbayes() path is Gaussian-only; generalized/logistic fastBKMR work is not part of the installed fbkmr package used here.

Assumptions

The causal interpretation depends on standard longitudinal g-formula conditions:

  • consistency of the observed outcome under the observed exposure history;
  • sequential exchangeability after conditioning on measured history;
  • positivity for the intervention trajectories;
  • correct enough model specification for the fitted BKMR mediator and outcome models.

The flexible kernel helps with nonlinear and interactive exposure response, but it does not remove the need for measured confounding control and realistic intervention values.

Practical Limits

  • Baseline covariates enter as linear terms rather than kernel modifiers.
  • Time-varying covariates are modeled as Gaussian mediator responses in the current implementation.
  • Binary outcomes are supported only by the standard bkmr engine.
  • Explicit a_vals and astar_vals should stay within observed support unless extrapolation is scientifically justified.