Fit Survival Models with Stochastic Gradient Descent
Source:R/bigSurvSGD.na.omit.R
bigSurvSGD.na.omit.RdPerforms stochastic gradient descent optimisation for large-scale survival models after removing observations with missing values.
Usage
bigSurvSGD.na.omit(
formula = survival::Surv(time = time, status = status) ~ .,
data,
norm.method = "standardize",
features.mean = NULL,
features.sd = NULL,
opt.method = "AMSGrad",
beta.init = NULL,
beta.type = "averaged",
lr.const = 0.12,
lr.tau = 0.5,
strata.size = 20,
batch.size = 1,
num.epoch = 100,
b1 = 0.9,
b2 = 0.99,
eps = 1e-08,
inference.method = "plugin",
num.boot = 1000,
num.epoch.boot = 100,
boot.method = "SGD",
lr.const.boot = 0.12,
lr.tau.boot = 0.5,
num.sample.strata = 1000,
sig.level = 0.05,
beta0 = 0,
alpha = NULL,
lambda = NULL,
nlambda = 100,
num.strata.lambda = 10,
lambda.scale = 1,
parallel.flag = FALSE,
num.cores = NULL,
bigmemory.flag = FALSE,
num.rows.chunk = 1e+06,
col.names = NULL,
type = "float"
)Arguments
- formula
Model formula describing the survival outcome and the set of predictors to include in the optimisation.
- data
Input data set or connection to a big-memory backed design matrix that contains the variables referenced in
formula.- norm.method
Normalization strategy applied to the feature matrix before optimisation, for example centring or standardising columns.
- features.mean
Optional pre-computed column means used when normalising the features so that repeated fits can reuse shared statistics.
- features.sd
Optional pre-computed column standard deviations used in concert with
features.meanfor scaling the predictors.- opt.method
Gradient based optimisation routine to employ, such as vanilla SGD or adaptive methods like Adam.
- beta.init
Vector of starting values for the regression coefficients supplied when warm-starting the optimisation.
- beta.type
Indicator controlling how
beta.initis interpreted, for example whether the coefficients correspond to the original or normalised scale.- lr.const
Base learning-rate constant used by the stochastic gradient descent routine.
- lr.tau
Learning-rate decay horizon or damping factor that moderates the step size schedule.
- strata.size
Number of observations drawn per stratum when building mini-batches for the optimisation loop.
- batch.size
Total number of observations assembled into each stochastic gradient batch.
- num.epoch
Number of passes over the training data used during the optimisation.
- b1
First exponential moving-average rate used by adaptive methods such as Adam to smooth gradients.
- b2
Second exponential moving-average rate used by adaptive methods to smooth squared gradients.
- eps
Numerical stabilisation constant added to denominators when updating the adaptive moments.
- inference.method
Inference approach requested after fitting, for example naive asymptotics or bootstrap resampling.
- num.boot
Number of bootstrap replicates to draw when
inference.methodrelies on resampling.- num.epoch.boot
Number of optimisation epochs to run within each bootstrap replicate.
- boot.method
Type of bootstrap scheme to apply, such as ordinary or stratified resampling.
- lr.const.boot
Learning-rate constant used during bootstrap refits.
- lr.tau.boot
Learning-rate decay factor applied during bootstrap refits.
- num.sample.strata
Number of strata sampled without replacement during each bootstrap iteration when stratified resampling is selected.
- sig.level
Significance level used when constructing confidence intervals or hypothesis tests.
- beta0
Optional vector of coefficients under the null hypothesis when performing hypothesis tests.
- alpha
Elastic-net mixing parameter controlling the relative weight of \(\ell_1\) and \(\ell_2\) regularisation penalties.
- lambda
Sequence of regularisation strengths supplied explicitly for penalised estimation.
- nlambda
Number of automatically generated
lambdavalues when a grid is produced internally.- num.strata.lambda
Number of strata used when tuning
lambdavia cross-validation or other search procedures.- lambda.scale
Scale on which the
lambdagrid is generated, for example logarithmic or linear spacing.- parallel.flag
Logical flag enabling parallel computation of gradients or bootstrap replicates.
- num.cores
Number of processing cores to use when parallel execution is enabled.
- bigmemory.flag
Logical flag indicating whether intermediate matrices should be stored using bigmemory backed objects.
- num.rows.chunk
Row chunk size to use when streaming data from an on-disk matrix representation.
- col.names
Optional character vector of column names associated with the feature matrix.
- type
Type of survival model to fit, for example Cox proportional hazards or accelerated failure time variants.
Value
A fitted model object storing the learned coefficients, optimisation metadata, and any requested inference summaries. coef: Log of hazards ratio. If no inference is used, it returns a vector for estimated coefficients: If inference is used, it returns a matrix including estimates and confidence intervals of coefficients. In case of penalization, it resturns a matrix with columns corresponding to lambdas. coef.exp: Exponentiated version of coef (hazards ratio). lambda: Returns lambda(s) used for penalizarion. alpha: Returns alpha used for penalizarion. features.mean: Returns means of features, if given or calculated features.sd: Returns standard deviations of features, if given or calculated.
See also
See Also bigSurvSGD,
bigscale for constructing normalised design matrices and
partialbigSurvSGDv0 for partial fitting pipelines.
Examples
# \donttest{
data(micro.censure, package = "bigPLScox")
surv_data <- stats::na.omit(micro.censure[, c("survyear", "DC", "sexe", "Agediag")])
# Increase num.epoch and num.boot for real use
fit <- bigSurvSGD.na.omit(
survival::Surv(survyear, DC) ~ .,
data = surv_data,
norm.method = "standardize",
opt.method = "adam",
batch.size = 16,
num.epoch = 2,
)
#> Warning: Strata size times batch size is greater than number of observations.
#> This package resizes them to strata size = 20 and batch size = 4
# }