Construct Scaled Design Matrices for Big Survival Models

Prepares a large-scale feature matrix for stochastic gradient descent byapplying optional normalisation, stratified sampling, and batching rules.

Usage

bigscale(
  formula = survival::Surv(time = time, status = status) ~ .,
  data,
  norm.method = "standardize",
  strata.size = 20,
  batch.size = 1,
  features.mean = NULL,
  features.sd = NULL,
  parallel.flag = FALSE,
  num.cores = NULL,
  bigmemory.flag = FALSE,
  num.rows.chunk = 1e+06,
  col.names = NULL,
  type = "short"
)

Arguments

formula: formula used to extract the outcome and predictors that should be included in the scaled design matrix.
data: Input data source containing the variables referenced in formula.
norm.method: Normalisation strategy (for example centring or standardising columns) applied to the feature matrix.
strata.size: Number of observations to retain from each stratum when constructing stratified batches.
batch.size: Total size of each mini-batch produced by the scaling routine.
features.mean: Optional vector of column means that can be reused to normalise multiple data sets in a consistent manner.
features.sd: Optional vector of column standard deviations that pairs with features.mean during scaling.
parallel.flag: Logical flag signalling whether the scaling work should be parallelised across cores.
num.cores: Number of processor cores allocated when parallel.flag is TRUE.
bigmemory.flag: Logical flag specifying whether intermediate results should be stored in bigmemory-backed matrices.
num.rows.chunk: Chunk size used when streaming data from on-disk objects into memory.
col.names: Optional character vector assigning column names to the generated design matrix.
type: Type of model or preprocessing target being prepared, such as survival or regression.

Value

A scaled design matrix of the scaler class along with metadata describing the transformation that was applied. time.indices: indices of the time variable cens.indices: indices of the censored variables features.indices: indices of the features time.sd: standard deviation of the time variable time.mean: mean of the time variable features.sd: standard deviation of the features features.mean: mean of the features nr: number of rows nc: number of columns col.names: columns names

Examples

data(micro.censure, package = "bigPLScox")
surv_data <- stats::na.omit(
  micro.censure[, c("survyear", "DC", "sexe", "Agediag")]
)
scaled <- bigscale(
  survival::Surv(survyear, DC) ~ .,
  data = surv_data,
  norm.method = "standardize",
  batch.size = 16
)
#> Warning: Strata size times batch size is greater than number of observations.
#>  This package resizes them to strata size = 20 and batch size = 4

Construct Scaled Design Matrices for Big Survival Models

Usage

Arguments

Value

See also

Examples