Functions to convert user data into the wide-format data structure required for g-BKMR analysis. Handles variable naming, transformations, and metadata. Prepare user matrices for g-BKMR analysis

Converts user-provided matrices (Y, Z, X) into the wide-format data structure required for g-BKMR analysis. Supports both continuous and binary time-dependent covariates with enhanced input validation.

prepare_gbkmr_data(
  Y,
  Z,
  X,
  time_points,
  mixture_components,
  td_covariates,
  baseline_covariates = 1,
  td_covariate_names = NULL,
  log_transform_mixtures = TRUE,
  validate_input = TRUE
)

Arguments

Y

Numeric vector. Outcome variable (length n).

Z

Numeric matrix. Mixture exposure matrix (n x (Adim x T)).

X

Numeric matrix. Covariate matrix (n x (Ldim x T + baseline_covs)).

time_points

Integer. Number of time points (T).

mixture_components

Integer. Number of mixture components per time point (Adim).

td_covariates

Integer. Number of time-dependent covariates per time point (Ldim).

baseline_covariates

Integer. Number of baseline covariates (default: 1).

td_covariate_names

Character vector. Names for time-dependent covariates (optional).

log_transform_mixtures

Logical. Whether to log-transform mixture exposures (default: TRUE).

validate_input

Logical. Whether to validate input dimensions (default: TRUE).

Value

A data frame in g-BKMR format with proper variable naming and metadata.

Details

The function expects matrices organized as follows:

  • Z matrix: Mixtures in chronological order (Mix1_T0, Mix2_T0, ..., MixAdim_T0, Mix1_T1, ...)

  • X matrix: (TD_Cov1_T1, TD_Cov2_T1, ..., TD_CovLdim_T1, ..., TD_CovLdim_T(T-1), Baseline1, Baseline2, ...)

The output data frame has the following structure:

  • sex: First baseline covariate (required by g-BKMR format)

  • baseline_2, baseline_3, ...: Additional baseline covariates

  • td_covariate1_0, td_covariate2_0, ...: Baseline time-dependent covariates

  • logM1_0, logM2_0, ...: Mixture exposures at time 0

  • logM1_1, logM2_1, ...: Mixture exposures at time 1

  • td_covariate1_1, td_covariate2_1, ...: Time-dependent covariates at time 1

  • Y: Outcome variable

  • id: Subject identifier

Examples

if (FALSE) { # \dontrun{
# Generate test data
n <- 200
Y <- rnorm(n)
Z <- matrix(rlnorm(n * 6), nrow = n, ncol = 6)  # 2 metals x 3 time points
X <- matrix(rnorm(n * 8), nrow = n, ncol = 8)   # 2 TD covs x 3 time points + 2 baseline

# Prepare data
prepared_data <- prepare_gbkmr_data(
  Y = Y, Z = Z, X = X,
  time_points = 3,
  mixture_components = 2,
  td_covariates = 2,
  baseline_covariates = 2,
  td_covariate_names = c("bmi", "bp")
)

# Check the structure
str(prepared_data)
head(prepared_data)
} # }