Skip to contents

Fits a Cox proportional hazards regression model using a gradient-descent optimizer implemented in C++. The function operates directly on a bigmemory::big.matrix object to avoid materialising large design matrices in memory.

Usage

big_pls_cox_gd(
  X,
  time,
  status,
  ncomp = NULL,
  max_iter = 500L,
  tol = 1e-06,
  learning_rate = 0.01,
  keepX = NULL
)

Arguments

X

A bigmemory::big.matrix containing the design matrix (rows are observations).

time

A numeric vector of follow-up times with length equal to the number of rows of X.

status

A numeric or integer vector of the same length as time containing the event indicators (1 for an event, 0 for censoring).

ncomp

An integer giving the number of components (columns) to use from X. Defaults to min(5, ncol(X)).

max_iter

Maximum number of gradient-descent iterations (default 500).

tol

Convergence tolerance on the Euclidean distance between successive coefficient vectors.

learning_rate

Step size used for the gradient-descent updates.

keepX

Optional integer vector describing the number of predictors to retain per component (naive sparsity). A value of zero keeps all predictors.

Value

A list with components:

  • coefficients: Estimated Cox regression coefficients on the latent scores.

  • loglik: Final partial log-likelihood value.

  • iterations: Number of gradient-descent iterations performed.

  • converged: Logical flag indicating whether convergence was achieved.

  • scores: Matrix of latent score vectors (one column per component).

  • loadings: Matrix of loading vectors associated with each component.

  • weights: Matrix of PLS weight vectors.

  • center: Column means used to centre the predictors.

  • scale: Column scales (standard deviations) used to standardise the predictors.

References

Maumy, M., Bertrand, F. (2023). PLS models and their extension for big data. Joint Statistical Meetings (JSM 2023), Toronto, ON, Canada.

Maumy, M., Bertrand, F. (2023). bigPLS: Fitting and cross-validating PLS-based Cox models to censored big data. BioC2023 — The Bioconductor Annual Conference, Dana-Farber Cancer Institute, Boston, MA, USA. Poster. https://doi.org/10.7490/f1000research.1119546.1

Bastien, P., Bertrand, F., Meyer, N., & Maumy-Bertrand, M. (2015). Deviance residuals-based sparse PLS and sparse kernel PLS for censored data. Bioinformatics, 31(3), 397–404. doi:10.1093/bioinformatics/btu660

Bertrand, F., Bastien, P., Meyer, N., & Maumy-Bertrand, M. (2014). PLS models for censored data. In Proceedings of UseR! 2014 (p. 152).

Examples

# \donttest{
library(bigmemory)
set.seed(1)
n <- 50
p <- 10
X <- bigmemory::as.big.matrix(matrix(rnorm(n * p), n, p))
time <- rexp(n, rate = 0.1)
status <- rbinom(n, 1, 0.7)
fit <- big_pls_cox_gd(X, time, status, ncomp = 3, max_iter = 200)
# }