Skip to contents

Compute Partial Least Squares (PLS) components tailored for Cox proportional hazards models when predictors are stored as a big.matrix from the bigmemory package.

Usage

big_pls_cox(
  X,
  time,
  status,
  ncomp = 2L,
  control = survival::coxph.control(),
  keepX = NULL
)

Arguments

X

A numeric matrix or a bigmemory::big.matrix object containing the predictors.

time

Numeric vector of survival times.

status

Integer (0/1) vector of event indicators.

ncomp

Number of latent components to compute.

control

Optional list passed to survival::coxph.control.

keepX

Optional integer vector specifying the number of variables to retain (naive sparsity) in each component. A value of zero keeps all predictors. If a single integer is supplied it is recycled across components.

Value

A list with the computed scores, loadings, weights, scaling information and the fitted Cox model returned by survival::coxph.fit.

Details

The function standardises each predictor column, iteratively builds latent scores using martingale residuals from Cox fits, and deflates the predictors without materialising the full design matrix in memory. Both in-memory and file-backed bigmemory matrices are supported.

References

Maumy, M., Bertrand, F. (2023). PLS models and their extension for big data. Joint Statistical Meetings (JSM 2023), Toronto, ON, Canada.

Maumy, M., Bertrand, F. (2023). bigPLS: Fitting and cross-validating PLS-based Cox models to censored big data. BioC2023 — The Bioconductor Annual Conference, Dana-Farber Cancer Institute, Boston, MA, USA. Poster. https://doi.org/10.7490/f1000research.1119546.1

Bastien, P., Bertrand, F., Meyer, N., & Maumy-Bertrand, M. (2015). Deviance residuals-based sparse PLS and sparse kernel PLS for censored data. Bioinformatics, 31(3), 397–404. doi:10.1093/bioinformatics/btu660

Bertrand, F., Bastien, P., Meyer, N., & Maumy-Bertrand, M. (2014). PLS models for censored data. In Proceedings of UseR! 2014 (p. 152).

Examples

if (requireNamespace("survival", quietly = TRUE)) {
  set.seed(1)
  X <- matrix(rnorm(100), nrow = 20)
  time <- rexp(20)
  status <- rbinom(20, 1, 0.5)
  fit <- big_pls_cox(X, time, status, ncomp = 2)
  str(fit)
}
#> List of 9
#>  $ scores  : num [1:20, 1:2] 0.315 0.28 -1.226 1.668 -0.372 ...
#>  $ loadings: num [1:5, 1:2] 0.6905 0.0875 0.2379 0.2205 -0.6772 ...
#>  $ weights : num [1:5, 1:2] 0.681 0.273 0.248 0.117 -0.622 ...
#>  $ center  : num [1:5] 0.19052 -0.00647 0.1388 0.10174 0.11985
#>  $ scale   : num [1:5] 0.913 0.871 0.81 1.05 0.911
#>  $ cox_fit :List of 10
#>   ..$ coefficients     : num [1:2] 0.756 0.247
#>   ..$ var              : num [1:2, 1:2] 0.1286 0.0544 0.0544 0.2571
#>   ..$ loglik           : num [1:2] -18.5 -15.8
#>   ..$ score            : num 4.82
#>   ..$ iter             : int 4
#>   ..$ linear.predictors: num [1:20] 0.0329 0.0502 -0.7536 1.4699 -0.5345 ...
#>   ..$ residuals        : Named num [1:20] -0.26 -0.125 -0.504 0.837 0.921 ...
#>   .. ..- attr(*, "names")= chr [1:20] "1" "2" "3" "4" ...
#>   ..$ means            : num [1:2] 7.77e-17 -3.33e-17
#>   ..$ method           : chr "efron"
#>   ..$ class            : chr "coxph"
#>  $ keepX   : int [1:2] 0 0
#>  $ time    : num [1:20] 0.7632 1.5727 1.8356 0.0372 0.1256 ...
#>  $ status  : num [1:20] 0 1 0 1 1 1 0 0 0 1 ...
#>  - attr(*, "class")= chr "big_pls_cox"