This vignette covers diagnostics for the current
causalBKMR workflow. The relevant user-facing functions are
prepare_gbkmr_data(),
detect_variable_patterns(), and gbkmr_run().
Archived code under R/old/ is not part of this
workflow.
Before running MCMC, inspect the columns that
gbkmr_run() will use.
patterns <- detect_variable_patterns(prepared, T = 3)
patterns
mixture_cols <- grep("^logM\\d+_\\d+$", names(prepared), value = TRUE)
mixture_colsConfirm that:
patterns$p is the expected number of mixture components
per time.mixture_cols contains the expected logM*_t
columns.patterns$Ldim is the expected number of time-varying
covariates per time.patterns$td_covariate_names contains the expected
covariate names.patterns$td_vars_by_time maps the follow-up covariate
columns correctly.Y is present and points to the intended outcome
column.If the detected columns are wrong, return to the Y,
Z, and X matrices passed into
prepare_gbkmr_data(). The most common issue is an incorrect
ordering of X: all time-varying covariates must come before
baseline covariates.
gbkmr_run(verbose = TRUE) prints an audit before fitting
models.
fit <- gbkmr_run(
data = prepared,
outcome = "Y",
outcome_type = "continuous",
time_points = 3,
engine = "auto",
iter = 15000,
K = 1000,
verbose = TRUE
)Read this audit before starting a long run. Check the outcome type, number of time points, number of mixture components, selected engine, sample size, and intervention contrast.
The result object stores convergence diagnostics when available.
fit$diagnosticsIf warnings appear, rerun with more iterations and a later posterior selection window:
fit_long <- gbkmr_run(
data = prepared,
outcome = "Y",
outcome_type = "continuous",
time_points = 3,
iter = 60000,
sel = seq(floor(60000 * 0.8), 60000, by = 25),
K = 1000
)The sel argument must contain MCMC iteration indices no
larger than iter. Use later iterations when early burn-in
appears unstable.
gbkmr_run() returns lower-level model objects in
raw_results.
raw <- fit$raw_results
names(raw)
raw$fit_mediators
raw$fit_y
raw$metaFor the standard BKMR engine, raw$fit_y is a
bkmrfit object. For fastBKMR, raw$fit_y is a
list of subset fits.
Use bkmr::TracePlot() for standard BKMR fits.
bkmr::TracePlot(fit = fit$raw_results$fit_y, par = "beta")
bkmr::TracePlot(fit = fit$raw_results$fit_y, par = "sigsq.eps")
bkmr::TracePlot(fit = fit$raw_results$fit_y, par = "r")For fastBKMR, inspect subset fits:
fit_list <- fit$raw_results$fit_y
bkmr::TracePlot(fit = fit_list[[1]], par = "beta")
for (i in seq_along(fit_list)) {
bkmr::TracePlot(fit = fit_list[[i]], par = "beta")
}Long trends, abrupt jumps, or chains that remain stuck for long
periods are signs that the run needs more iterations or a different
sel window.
The g-computation stage stores posterior draws for the low and high counterfactual means.
raw <- fit$raw_results
ace_draws <- raw$Yastar - raw$Ya
plot(ace_draws, type = "l",
xlab = "Posterior draw",
ylab = "ACE draw")
hist(ace_draws, breaks = 30,
main = "Posterior ACE",
xlab = "Y(a*) - Y(a)")
fit$causal_effectFor binary outcomes, these draws are on the risk-difference scale.
| Symptom | Likely cause | Fix |
|---|---|---|
| Mixture columns are missing |
Z has the wrong number of columns |
Use mixture_components * time_points columns |
| Baseline covariates detected incorrectly |
X columns are in the wrong order |
Put all time-varying columns before baseline columns |
| Binary outcome uses continuous model |
outcome_type left at the default |
Set outcome_type = "binary"
|
| fastBKMR requested for a binary outcome | Unsupported engine/outcome combination | Use engine = "bkmr"
|
| Warnings about low effective sample size | Too few stable posterior draws | Increase iter and move sel later |
| Counterfactual values look implausible | Contrast outside support | Use a_probs, or choose
a_vals/astar_vals within observed ranges |
| Parallel fastBKMR fails | Worker startup failed | Set n_cores = 1 or run on a compute node |