Gradient based PLS Cox for big matrices

Fit a PLS Cox model where the PLS components are computed once and the Cox regression in the latent space is optimised by gradient based methods.

This function is intended as a faster, approximate alternative to big_pls_cox_fast() when many fits are required or when the design is stored as a bigmemory::big.matrix.

Usage

big_pls_cox_gd(
  X,
  time,
  status,
  ncomp = 2L,
  max_iter = 2000L,
  tol = 1e-08,
  learning_rate = 0.05,
  keepX = NULL,
  coxfit = TRUE,
  method = c("gd", "bb", "nesterov", "bfgs"),
  diag = TRUE
)

Arguments

X

A bigmemory::big.matrix containing the design matrix (rows are observations).

time

A numeric vector of follow-up times with length equal to the number of rows of X.

status

A numeric or integer vector of the same length as time containing the event indicators (1 for an event, 0 for censoring).

ncomp

An integer giving the number of components (columns) to use from X. Defaults to min(5, ncol(X)).

max_iter

Maximum number of gradient iterations.

tol

Convergence tolerance for the optimisation in the Cox space. Both the change in log-likelihood and the Euclidean change in the coefficient vector are monitored.

learning_rate

Initial learning rate for first order methods. This is used by "gd" and as a starting scale for "bb" and "nesterov". It is ignored by "bfgs".

keepX

Optional integer vector describing the number of predictors to retain per component (naive sparsity). A value of zero keeps all predictors.

coxfit

Optional Boolean to fit a Cox model on the extracted components.

method

Optimisation scheme used in the latent space. One of

"gd": Simple fixed step gradient descent. This is the most transparent method and is useful for debugging and didactic purposes.
"bb": Barzilai Borwein step size. Uses a diagonal quasi-second-order update of the step size based on the last two gradients. Often converges faster than "gd" at similar computational cost.
"nesterov": Nesterov type accelerated gradient with a fixed momentum schedule. Can yield smoother convergence in early iterations, at the price of slightly more bookkeeping.
"bfgs": Quasi Newton update in the latent space, with a limited memory BFGS type approximation of the Hessian. This gives the most accurate coefficients in a small number of iterations but requires more linear algebra per step.

The default is "bb", which provides a good balance between speed and robustness in most examples.

diag

Logical. If TRUE, store iteration level diagnostics.

Value

An object of class "big_pls_cox_gd" that contains:

coefficients: Estimated Cox regression coefficients on the latent scores.
loglik: Final partial log-likelihood value.
iterations: Number of gradient-descent iterations performed.
converged: Logical flag indicating whether convergence was achieved.
scores: Matrix of latent score vectors (one column per component).
loadings: Matrix of loading vectors associated with each component.
weights: Matrix of PLS weight vectors.
center: Column means used to centre the predictors.
scale: Column scales (standard deviations) used to standardise the predictors.
keepX: Vector describing the number of predictors retained per component.
time: Numeric vector of follow-up times.
status: Numeric or integer vector containing the event indicators.
loglik_trace: Trace of the loglik.
step_trace: Trace of the steps
gradnorm_trace: Trace of the gradnorm.
cox_fit: Final Cox model fitted on the components.

Details

The function first computes PLS components using the same procedure as big_pls_cox_fast(), then holds these components fixed and optimises the Cox partial log-likelihood in the reduced space.

The coefficients stored in fit$coefficients are the result of the chosen optimisation method and are approximate. The field fit$cox_fit contains the Cox model refitted by survival::coxph() on the final PLS scores. Prediction functions use the coefficients from cox_fit so that linear predictors are directly interpretable as Cox risk scores.

The optimisation tolerances control the compromise between speed and accuracy. If you need very close agreement with the exact PLS Cox solution, use big_pls_cox_fast(). If you only need accurate risk rankings and want to fit many models, the gradient based method with "bb" or "bfgs" is usually sufficient.

References

Maumy, M., Bertrand, F. (2023). PLS models and their extension for big data. Joint Statistical Meetings (JSM 2023), Toronto, ON, Canada.

Maumy, M., Bertrand, F. (2023). bigPLS: Fitting and cross-validating PLS-based Cox models to censored big data. BioC2023 — The Bioconductor Annual Conference, Dana-Farber Cancer Institute, Boston, MA, USA. Poster. https://doi.org/10.7490/f1000research.1119546.1

Bastien, P., Bertrand, F., Meyer, N., & Maumy-Bertrand, M. (2015). Deviance residuals-based sparse PLS and sparse kernel PLS for censored data. Bioinformatics, 31(3), 397–404. doi:10.1093/bioinformatics/btu660

Bertrand, F., Bastien, P., Meyer, N., & Maumy-Bertrand, M. (2014). PLS models for censored data. In Proceedings of UseR! 2014 (p. 152).

Examples

library(bigmemory)
set.seed(1)
n <- 50
p <- 10
X <- bigmemory::as.big.matrix(matrix(rnorm(n * p), n, p))
time <- rexp(n, rate = 0.1)
status <- rbinom(n, 1, 0.7)
fit <- big_pls_cox_gd(X, time, status, ncomp = 3, max_iter = 200)
str(fit)
#> List of 16
#>  $ coefficients  : num [1:3] 0.593 0.165 0.314
#>  $ loglik        : num -104
#>  $ iterations    : int 200
#>  $ converged     : logi FALSE
#>  $ scores        : num [1:50, 1:3] 0.0513 1.555 -1.1934 0.7405 -1.9847 ...
#>  $ loadings      : num [1:10, 1:3] 0.3689 -0.0474 0.5673 0.1476 0.4647 ...
#>  $ weights       : num [1:10, 1:3] 0.5334 -0.1685 0.4853 0.0962 0.3655 ...
#>  $ center        : num [1:10] 0.1004 0.1173 -0.1525 0.0769 -0.0313 ...
#>  $ scale         : num [1:10] 0.831 0.969 0.9 1.009 1.095 ...
#>  $ keepX         : int [1:3] 0 0 0
#>  $ time          : num [1:50] 0.6162 3.6972 12.2628 0.271 0.0901 ...
#>  $ status        : num [1:50] 1 1 0 1 0 1 1 1 1 1 ...
#>  $ loglik_trace  : num [1:200, 1] -106 -106 -105 -104 -104 ...
#>  $ step_trace    : num [1:200, 1] 0.05 0.05 0.025 0.025 0.025 0.05 0.025 0.025 0.025 0.05 ...
#>  $ gradnorm_trace: num [1:200, 1] 10.87 11.592 3.16 0.662 0.166 ...
#>  $ cox_fit       :List of 19
#>   ..$ coefficients     : Named num [1:3] 0.376 -0.323 -0.199
#>   .. ..- attr(*, "names")= chr [1:3] "comp1" "comp2" "comp3"
#>   ..$ var              : num [1:3, 1:3] 0.03725 -0.00601 0.00358 -0.00601 0.0223 ...
#>   ..$ loglik           : num [1:2] -110 -106
#>   ..$ score            : num 7.83
#>   ..$ iter             : int 4
#>   ..$ linear.predictors: num [1:50] -0.399 0.338 -1.056 1.072 -0.37 ...
#>   ..$ residuals        : Named num [1:50] 0.938 0.4757 -0.2709 0.899 -0.0117 ...
#>   .. ..- attr(*, "names")= chr [1:50] "1" "2" "3" "4" ...
#>   ..$ means            : Named num [1:3] 4.44e-18 1.02e-16 8.88e-18
#>   .. ..- attr(*, "names")= chr [1:3] "comp1" "comp2" "comp3"
#>   ..$ method           : chr "efron"
#>   ..$ n                : int 50
#>   ..$ nevent           : num 37
#>   ..$ terms            :Classes 'terms', 'formula'  language survival::Surv(time, status) ~ comp1 + comp2 + comp3
#>   .. .. ..- attr(*, "variables")= language list(survival::Surv(time, status), comp1, comp2, comp3)
#>   .. .. ..- attr(*, "factors")= int [1:4, 1:3] 0 1 0 0 0 0 1 0 0 0 ...
#>   .. .. .. ..- attr(*, "dimnames")=List of 2
#>   .. .. .. .. ..$ : chr [1:4] "survival::Surv(time, status)" "comp1" "comp2" "comp3"
#>   .. .. .. .. ..$ : chr [1:3] "comp1" "comp2" "comp3"
#>   .. .. ..- attr(*, "term.labels")= chr [1:3] "comp1" "comp2" "comp3"
#>   .. .. ..- attr(*, "specials")=Dotted pair list of 5
#>   .. .. .. ..$ strata : NULL
#>   .. .. .. ..$ tt     : NULL
#>   .. .. .. ..$ frailty: NULL
#>   .. .. .. ..$ ridge  : NULL
#>   .. .. .. ..$ pspline: NULL
#>   .. .. ..- attr(*, "order")= int [1:3] 1 1 1
#>   .. .. ..- attr(*, "intercept")= num 1
#>   .. .. ..- attr(*, "response")= int 1
#>   .. .. ..- attr(*, ".Environment")=<environment: 0x136a52170> 
#>   .. .. ..- attr(*, "predvars")= language list(survival::Surv(time, status), comp1, comp2, comp3)
#>   .. .. ..- attr(*, "dataClasses")= Named chr [1:4] "nmatrix.2" "numeric" "numeric" "numeric"
#>   .. .. .. ..- attr(*, "names")= chr [1:4] "survival::Surv(time, status)" "comp1" "comp2" "comp3"
#>   ..$ assign           :List of 3
#>   .. ..$ comp1: int 1
#>   .. ..$ comp2: int 2
#>   .. ..$ comp3: int 3
#>   ..$ wald.test        : num 7.52
#>   ..$ concordance      : Named num [1:7] 589 313 0 0 0 ...
#>   .. ..- attr(*, "names")= chr [1:7] "concordant" "discordant" "tied.x" "tied.y" ...
#>   ..$ y                : 'Surv' num [1:50, 1:2]  0.6162   3.6972  12.2628+  0.2710   0.0901+  1.5697   6.7861   3.0889   0.5886   2.0551  ...
#>   .. ..- attr(*, "dimnames")=List of 2
#>   .. .. ..$ : chr [1:50] "1" "2" "3" "4" ...
#>   .. .. ..$ : chr [1:2] "time" "status"
#>   .. ..- attr(*, "type")= chr "right"
#>   ..$ timefix          : logi TRUE
#>   ..$ formula          :Class 'formula'  language survival::Surv(time, status) ~ comp1 + comp2 + comp3
#>   .. .. ..- attr(*, ".Environment")=<environment: 0x136a52170> 
#>   ..$ call             : language survival::coxph(formula = survival::Surv(time, status) ~ ., data = scores_df,      ties = "efron", x = FALSE)
#>   ..- attr(*, "class")= chr "coxph"
#>  - attr(*, "class")= chr "big_pls_cox_gd"
head(fit$scores)
#>             [,1]       [,2]       [,3]
#> [1,]  0.05129539  0.7550799  0.8793004
#> [2,]  1.55497806  0.3715325  0.6353775
#> [3,] -1.19340964  0.4907382  2.2626350
#> [4,]  0.74046728 -2.1053220 -0.5759915
#> [5,] -1.98468237 -0.8043244 -0.5814998
#> [6,]  0.42019957  1.5593684  0.5762548