Large Samples with fastBKMR

This vignette describes how the current gbkmr_run() interface uses the standard BKMR engine and the optional fastbkmr engine. The data preparation and intervention arguments are identical across engines.

Archived functions under R/old/ are not used.

Engine Choice

Version 1 separates the standard BKMR path from the fastBKMR path:

Outcome	Time-dependent confounder `L`	`engine = "bkmr"`	`engine = "fastbkmr"`
continuous	all continuous	Supported: Gaussian models throughout	Supported: Gaussian models throughout
binary	all continuous	Supported: probit outcome model	Not supported; hard error
continuous	any binary	Runs with warning: binary `L` is fit and sampled as Gaussian	Runs with warning: binary `L` is fit and sampled as Gaussian
binary	any binary	Runs with warning: probit outcome plus Gaussian `L` models	Not supported; hard error

Here, binary means a 0/1 outcome or a 0/1 time-dependent confounder. Mixture exposures can still be continuous. fbkmr::skmbayes() in the current public fbkmr package is a Gaussian fast path; generalized or logistic fastBKMR is not yet available in this installed package.

With engine = "auto", gbkmr_run() chooses fastbkmr only for large continuous-outcome analyses when fbkmr is installed. Otherwise it uses standard bkmr.

Install the Optional Engine

fbkmr is not a required dependency of causalBKMR. Install it only if you plan to run large continuous-outcome analyses.

install.packages("remotes")
remotes::install_github("junwei-lu/fbkmr")

Parallel fastBKMR runs also use doSNOW and doParallel when available.

install.packages(c("doSNOW", "doParallel"))

Automatic Selection

Use the same prepared data object created by prepare_gbkmr_data():

library(causalBKMR)

fit_auto <- gbkmr_run(
  data = prepared_large,
  outcome = "Y",
  outcome_type = "continuous",
  time_points = 3,
  engine = "auto",
  n = nrow(prepared_large),
  iter = 15000,
  K = 1000
)

For nrow(prepared_large) > 2000, this continuous-outcome example uses fastbkmr if fbkmr is installed and the time-dependent confounders are modeled as Gaussian. The n argument is optional when data is supplied, but setting it explicitly makes the intended sample size clear in reports and scripts.

Force fastBKMR

You can also request fastbkmr directly:

fit_fast <- gbkmr_run(
  data = prepared_large,
  outcome = "Y",
  outcome_type = "continuous",
  time_points = 3,
  engine = "fastbkmr",
  n_subset = 6,
  n_cores = 6,
  iter = 15000,
  K = 1000
)

n_subset controls how many subsets are fit. n_cores controls how many parallel workers are requested. If parallel execution is unavailable, the implementation falls back to a sequential run.

Compare Engines on a Subset

For a new analysis, it is useful to compare the engines on a smaller subset before launching a large run.

set.seed(1)
idx <- sample(seq_len(nrow(prepared_large)), 1500)
prepared_subset <- prepared_large[idx, ]

fit_bkmr <- gbkmr_run(
  data = prepared_subset,
  time_points = 3,
  engine = "bkmr",
  iter = 15000,
  K = 1000,
  n_knots = 50
)

fit_fast_subset <- gbkmr_run(
  data = prepared_subset,
  time_points = 3,
  engine = "fastbkmr",
  n_subset = 5,
  n_cores = 5,
  iter = 15000,
  K = 1000
)

fit_bkmr$causal_effect
fit_fast_subset$causal_effect

Small differences are expected because the engines use different posterior approximations. Large differences are a signal to inspect diagnostics, intervention support, and model settings.

Limitations

Binary outcomes require engine = "bkmr".
n_knots applies to standard BKMR only.
fastbkmr is optional; engine = "auto" falls back to bkmr when fbkmr is not installed.
The raw fit object returned by fit$raw_results differs by engine: standard BKMR stores one fit per model, while fastBKMR stores subset fits.