Functions to convert user data into the wide-format data structure required for g-BKMR analysis. Handles variable naming, transformations, and metadata. Prepare user matrices for g-BKMR analysis
Converts user-provided matrices (Y, Z, X) into the wide-format data structure required for g-BKMR analysis. Supports both continuous and binary time-dependent covariates with enhanced input validation.
prepare_gbkmr_data(
Y,
Z,
X,
time_points,
mixture_components,
td_covariates,
baseline_covariates = 1,
td_covariate_names = NULL,
log_transform_mixtures = TRUE,
validate_input = TRUE
)Numeric vector. Outcome variable (length n).
Numeric matrix. Mixture exposure matrix (n x (Adim x T)).
Numeric matrix. Covariate matrix (n x (Ldim x T + baseline_covs)).
Integer. Number of time points (T).
Integer. Number of mixture components per time point (Adim).
Integer. Number of time-dependent covariates per time point (Ldim).
Integer. Number of baseline covariates (default: 1).
Character vector. Names for time-dependent covariates (optional).
Logical. Whether to log-transform mixture exposures (default: TRUE).
Logical. Whether to validate input dimensions (default: TRUE).
A data frame in g-BKMR format with proper variable naming and metadata.
The function expects matrices organized as follows:
Z matrix: Mixtures in chronological order (Mix1_T0, Mix2_T0, ..., MixAdim_T0, Mix1_T1, ...)
X matrix: (TD_Cov1_T1, TD_Cov2_T1, ..., TD_CovLdim_T1, ..., TD_CovLdim_T(T-1), Baseline1, Baseline2, ...)
The output data frame has the following structure:
sex: First baseline covariate (required by g-BKMR format)
baseline_2, baseline_3, ...: Additional baseline covariates
td_covariate1_0, td_covariate2_0, ...: Baseline time-dependent covariates
logM1_0, logM2_0, ...: Mixture exposures at time 0
logM1_1, logM2_1, ...: Mixture exposures at time 1
td_covariate1_1, td_covariate2_1, ...: Time-dependent covariates at time 1
Y: Outcome variable
id: Subject identifier
if (FALSE) { # \dontrun{
# Generate test data
n <- 200
Y <- rnorm(n)
Z <- matrix(rlnorm(n * 6), nrow = n, ncol = 6) # 2 metals x 3 time points
X <- matrix(rnorm(n * 8), nrow = n, ncol = 8) # 2 TD covs x 3 time points + 2 baseline
# Prepare data
prepared_data <- prepare_gbkmr_data(
Y = Y, Z = Z, X = X,
time_points = 3,
mixture_components = 2,
td_covariates = 2,
baseline_covariates = 2,
td_covariate_names = c("bmi", "bp")
)
# Check the structure
str(prepared_data)
head(prepared_data)
} # }