Partial Least Squares Components for Cox Models with Big Matrices

Compute Partial Least Squares (PLS) components tailored for Cox proportional hazards models when predictors are stored as a big.matrix from the bigmemory package.

Usage

big_pls_cox(
  X,
  time,
  status,
  ncomp = 2L,
  control = survival::coxph.control(),
  keepX = NULL
)

Arguments

X: A numeric matrix or a bigmemory::big.matrix object containing the predictors.
time: Numeric vector of survival times.
status: Integer (0/1) vector of event indicators.
ncomp: Number of latent components to compute.
control: Optional list passed to survival::coxph.control.
keepX: Optional integer vector specifying the number of variables to retain (naive sparsity) in each component. A value of zero keeps all predictors. If a single integer is supplied it is recycled across components.

Value

A list with the computed scores, loadings, weights, scaling information and the fitted Cox model returned by survival::coxph.fit.

Details

The function standardises each predictor column, iteratively builds latent scores using martingale residuals from Cox fits, and deflates the predictors without materialising the full design matrix in memory. Both in-memory and file-backed bigmemory matrices are supported.

References

Maumy, M., Bertrand, F. (2023). PLS models and their extension for big data. Joint Statistical Meetings (JSM 2023), Toronto, ON, Canada.

Maumy, M., Bertrand, F. (2023). bigPLS: Fitting and cross-validating PLS-based Cox models to censored big data. BioC2023 — The Bioconductor Annual Conference, Dana-Farber Cancer Institute, Boston, MA, USA. Poster. https://doi.org/10.7490/f1000research.1119546.1

Bastien, P., Bertrand, F., Meyer, N., & Maumy-Bertrand, M. (2015). Deviance residuals-based sparse PLS and sparse kernel PLS for censored data. Bioinformatics, 31(3), 397–404. doi:10.1093/bioinformatics/btu660

Bertrand, F., Bastien, P., Meyer, N., & Maumy-Bertrand, M. (2014). PLS models for censored data. In Proceedings of UseR! 2014 (p. 152).

Examples

if (requireNamespace("survival", quietly = TRUE)) {
  set.seed(1)
  X <- matrix(rnorm(100), nrow = 20)
  time <- rexp(20)
  status <- rbinom(20, 1, 0.5)
  fit <- big_pls_cox(X, time, status, ncomp = 2)
  str(fit)
}
#> List of 9
#>  $ scores  : num [1:20, 1:2] 0.275 0.244 -1.069 1.455 -0.324 ...
#>  $ loadings: num [1:5, 1:2] 0.792 0.1 0.273 0.253 -0.777 ...
#>  $ weights : num [1:5, 1:2] 0.681 0.273 0.248 0.117 -0.622 ...
#>  $ center  : num [1:5] 0.19052 -0.00647 0.1388 0.10174 0.11985
#>  $ scale   : num [1:5] 0.913 0.871 0.81 1.05 0.911
#>  $ cox_fit :List of 10
#>   ..$ coefficients     : num [1:2] 0.867 0.231
#>   ..$ var              : num [1:2, 1:2] 0.1691 0.0584 0.0584 0.2252
#>   ..$ loglik           : num [1:2] -18.5 -15.8
#>   ..$ score            : num 4.82
#>   ..$ iter             : int 4
#>   ..$ linear.predictors: num [1:20] 0.0329 0.0502 -0.7536 1.4699 -0.5345 ...
#>   ..$ residuals        : Named num [1:20] -0.26 -0.125 -0.504 0.837 0.921 ...
#>   .. ..- attr(*, "names")= chr [1:20] "1" "2" "3" "4" ...
#>   ..$ means            : num [1:2] 1.67e-17 -3.33e-17
#>   ..$ method           : chr "efron"
#>   ..$ class            : chr "coxph"
#>  $ keepX   : int [1:2] 0 0
#>  $ time    : num [1:20] 0.7632 1.5727 1.8356 0.0372 0.1256 ...
#>  $ status  : num [1:20] 0 1 0 1 1 1 0 0 0 1 ...
#>  - attr(*, "class")= chr "big_pls_cox"