vignettes/gbkmr_method_overview.Rmd
gbkmr_method_overview.RmdThis vignette describes the methodology implemented by the current
causalBKMR package. The deployed API lives in the active
R/ files:
prepare_gbkmr_data() converts user matrices into the
wide g-BKMR analysis data.detect_variable_patterns() audits the resulting
columns.gbkmr_run() runs the standard analysis.run_gbkmr_panel() is a lower-level exported function
for advanced users who already understand the package’s internal
wide-data conventions.Archived code under R/old/ is not used in the vignettes
or the current recommended workflow.
We observe n subjects over T time points.
At each time point, each subject has:
p mixture exposure components.Ldim time-varying covariates.Y measured after the exposure and covariate
history.The user supplies three objects:
| Object | Shape | Meaning |
|---|---|---|
Y |
length n
|
Continuous or binary outcome |
Z |
n x (p * T) |
Mixture exposures ordered by time |
X |
n x (Ldim * T + B) |
Time-varying covariates first, baseline covariates last |
prepare_gbkmr_data() converts those objects into a wide
data frame with columns such as logM1_0,
logM2_0, waist_0, logM1_1, and
Y.
prepared <- prepare_gbkmr_data(
Y = Y,
Z = Z,
X = X,
time_points = 3,
mixture_components = 2,
td_covariates = 1,
baseline_covariates = 1,
td_covariate_names = "waist"
)
detect_variable_patterns(prepared, T = 3)The target estimand is the average causal effect comparing two full mixture trajectories:
By default, gbkmr_run() sets a to the 25th
percentile and a* to the 75th percentile for every mixture
component at every time point:
Users can supply explicit trajectories with a_vals and
astar_vals. Those vectors must be named with the mixture
column names in the prepared data.
exposure_names <- grep("^logM\\d+_\\d+$", names(prepared), value = TRUE)
a_vals <- setNames(rep(0.5, length(exposure_names)), exposure_names)
astar_vals <- setNames(rep(1.5, length(exposure_names)), exposure_names)
fit_custom <- gbkmr_run(
data = prepared,
time_points = 3,
a_vals = a_vals,
astar_vals = astar_vals
)The current gbkmr_run() implementation follows this
sequence:
a and
a*.Y(a*) - Y(a).For continuous outcomes, the result is on the outcome scale. For binary outcomes, the result is a risk difference.
gbkmr_run() can use two engines:
| Engine | Function | Use case |
|---|---|---|
bkmr |
bkmr::kmbayes() |
Default engine; supports continuous and binary outcomes |
fastbkmr |
fbkmr::skmbayes() |
Optional large-sample engine for continuous outcomes |
With engine = "auto", the package uses
fastbkmr only for large continuous-outcome analyses when
fbkmr is installed. Otherwise it uses standard
bkmr.
Version 1 support note:
| Outcome | Time-dependent confounder L
|
engine = "bkmr" |
engine = "fastbkmr" |
|---|---|---|---|
| continuous | all continuous | Supported: Gaussian models throughout | Supported: Gaussian models throughout |
| binary | all continuous | Supported: probit outcome model | Not supported; hard error |
| continuous | any binary | Runs with warning: binary L is fit and sampled as
Gaussian |
Runs with warning: binary L is fit and sampled as
Gaussian |
| binary | any binary | Runs with warning: probit outcome plus Gaussian L
models |
Not supported; hard error |
In this table, binary means a 0/1 outcome or a 0/1 time-dependent
confounder. Mixture exposures can remain continuous. The current public
fbkmr::skmbayes() path is Gaussian-only;
generalized/logistic fastBKMR work is not part of the installed
fbkmr package used here.
The causal interpretation depends on standard longitudinal g-formula conditions:
The flexible kernel helps with nonlinear and interactive exposure response, but it does not remove the need for measured confounding control and realistic intervention values.
bkmr
engine.a_vals and astar_vals should stay
within observed support unless extrapolation is scientifically
justified.