Methodology and Current Implementation

This vignette describes the methodology implemented by the current causalBKMR package. The deployed API lives in the active R/ files:

prepare_gbkmr_data() converts user matrices into the wide g-BKMR analysis data.
detect_variable_patterns() audits the resulting columns.
gbkmr_run() runs the standard analysis.
run_gbkmr_panel() is a lower-level exported function for advanced users who already understand the package’s internal wide-data conventions.

Archived code under R/old/ is not used in the vignettes or the current recommended workflow.

Problem Setup

We observe n subjects over T time points. At each time point, each subject has:

p mixture exposure components.
Ldim time-varying covariates.
optional baseline covariates.
an outcome Y measured after the exposure and covariate history.

The user supplies three objects:

Object	Shape	Meaning
`Y`	length `n`	Continuous or binary outcome
`Z`	`n x (p * T)`	Mixture exposures ordered by time
`X`	`n x (Ldim * T + B)`	Time-varying covariates first, baseline covariates last

prepare_gbkmr_data() converts those objects into a wide data frame with columns such as logM1_0, logM2_0, waist_0, logM1_1, and Y.

prepared <- prepare_gbkmr_data(
  Y = Y,
  Z = Z,
  X = X,
  time_points = 3,
  mixture_components = 2,
  td_covariates = 1,
  baseline_covariates = 1,
  td_covariate_names = "waist"
)

detect_variable_patterns(prepared, T = 3)

Causal Estimand

The target estimand is the average causal effect comparing two full mixture trajectories:

$ACE = E[Y(a^*)] - E[Y(a)].$

By default, gbkmr_run() sets a to the 25th percentile and a* to the 75th percentile for every mixture component at every time point:

fit <- gbkmr_run(
  data = prepared,
  time_points = 3,
  a_probs = c(0.25, 0.75)
)

Users can supply explicit trajectories with a_vals and astar_vals. Those vectors must be named with the mixture column names in the prepared data.

exposure_names <- grep("^logM\\d+_\\d+$", names(prepared), value = TRUE)

a_vals <- setNames(rep(0.5, length(exposure_names)), exposure_names)
astar_vals <- setNames(rep(1.5, length(exposure_names)), exposure_names)

fit_custom <- gbkmr_run(
  data = prepared,
  time_points = 3,
  a_vals = a_vals,
  astar_vals = astar_vals
)

Model Sequence

The current gbkmr_run() implementation follows this sequence:

Detect mixture, time-varying, baseline, and outcome columns.
Fit mediator models for time-varying covariates after baseline.
Fit the outcome model.
Simulate counterfactual covariate histories under a and a*.
Predict counterfactual outcomes.
Return posterior summaries of Y(a*) - Y(a).

For continuous outcomes, the result is on the outcome scale. For binary outcomes, the result is a risk difference.

BKMR Engines

gbkmr_run() can use two engines:

Engine	Function	Use case
`bkmr`	`bkmr::kmbayes()`	Default engine; supports continuous and binary outcomes
`fastbkmr`	`fbkmr::skmbayes()`	Optional large-sample engine for continuous outcomes

With engine = "auto", the package uses fastbkmr only for large continuous-outcome analyses when fbkmr is installed. Otherwise it uses standard bkmr.

Version 1 support note:

Outcome	Time-dependent confounder `L`	`engine = "bkmr"`	`engine = "fastbkmr"`
continuous	all continuous	Supported: Gaussian models throughout	Supported: Gaussian models throughout
binary	all continuous	Supported: probit outcome model	Not supported; hard error
continuous	any binary	Runs with warning: binary `L` is fit and sampled as Gaussian	Runs with warning: binary `L` is fit and sampled as Gaussian
binary	any binary	Runs with warning: probit outcome plus Gaussian `L` models	Not supported; hard error

In this table, binary means a 0/1 outcome or a 0/1 time-dependent confounder. Mixture exposures can remain continuous. The current public fbkmr::skmbayes() path is Gaussian-only; generalized/logistic fastBKMR work is not part of the installed fbkmr package used here.

Assumptions

The causal interpretation depends on standard longitudinal g-formula conditions:

consistency of the observed outcome under the observed exposure history;
sequential exchangeability after conditioning on measured history;
positivity for the intervention trajectories;
correct enough model specification for the fitted BKMR mediator and outcome models.

The flexible kernel helps with nonlinear and interactive exposure response, but it does not remove the need for measured confounding control and realistic intervention values.

Practical Limits

Baseline covariates enter as linear terms rather than kernel modifiers.
Time-varying covariates are modeled as Gaussian mediator responses in the current implementation.
Binary outcomes are supported only by the standard bkmr engine.
Explicit a_vals and astar_vals should stay within observed support unless extrapolation is scientifically justified.