Fitting a Cox-Model on group sparse PLSR components

This function computes the Cox Model based on PLSR components computed model with

as explanatory variables: Xplan.

It uses the package sgPLS to perform group PLSR fit.

Usage

coxsgpls(Xplan, ...)

# S3 method for class 'formula'
coxsgpls(
  Xplan,
  time,
  time2,
  event,
  type,
  origin,
  typeres = "deviance",
  collapse,
  weighted,
  scaleX = TRUE,
  scaleY = TRUE,
  ncomp = min(7, ncol(Xplan)),
  modepls = "regression",
  ind.block.x,
  keepX,
  alpha.x,
  upper.lambda = 10^5,
  plot = FALSE,
  allres = FALSE,
  dataXplan = NULL,
  subset,
  weights,
  model_frame = FALSE,
  model_matrix = FALSE,
  contrasts.arg = NULL,
  ...
)

# Default S3 method
coxsgpls(
  Xplan,
  time,
  time2,
  event,
  type,
  origin,
  typeres = "deviance",
  collapse,
  weighted,
  scaleX = TRUE,
  scaleY = TRUE,
  ncomp = min(7, ncol(Xplan)),
  modepls = "regression",
  ind.block.x,
  keepX,
  alpha.x,
  upper.lambda = 10^5,
  plot = FALSE,
  allres = FALSE,
  ...
)

Arguments

Xplan: a formula or a matrix with the eXplanatory variables (training) dataset
...: Arguments to be passed on to survival::coxph.
time: for right censored data, this is the follow up time. For interval data, the first argument is the starting time for the interval.
time2: The status indicator, normally 0=alive, 1=dead. Other choices are TRUE/FALSE (TRUE = death) or 1/2 (2=death). For interval censored data, the status indicator is 0=right censored, 1=event at time, 2=left censored, 3=interval censored. Although unusual, the event indicator can be omitted, in which case all subjects are assumed to have an event.
event: ending time of the interval for interval censored or counting process data only. Intervals are assumed to be open on the left and closed on the right, (start, end]. For counting process data, event indicates whether an event occurred at the end of the interval.
type: character string specifying the type of censoring. Possible values are "right", "left", "counting", "interval", or "interval2". The default is "right" or "counting" depending on whether the time2 argument is absent or present, respectively.
origin: for counting process data, the hazard function origin. This option was intended to be used in conjunction with a model containing time dependent strata in order to align the subjects properly when they cross over from one strata to another, but it has rarely proven useful.
typeres: character string indicating the type of residual desired. Possible values are "martingale", "deviance", "score", "schoenfeld", "dfbeta", "dfbetas", and "scaledsch". Only enough of the string to determine a unique match is required.
collapse: vector indicating which rows to collapse (sum) over. In time-dependent models more than one row data can pertain to a single individual. If there were 4 individuals represented by 3, 1, 2 and 4 rows of data respectively, then collapse=c(1,1,1,2,3,3,4,4,4,4) could be used to obtain per subject rather than per observation residuals.
weighted: if TRUE and the model was fit with case weights, then the weighted residuals are returned.
scaleX: Should the Xplan columns be standardized ?
scaleY: Should the time values be standardized ?
ncomp: The number of components to include in the model. It this is not supplied, min(7,maximal number) components is used.
modepls: character string. What type of algorithm to use, (partially) matching one of "regression", "canonical". See gPLS for details
ind.block.x: a vector of integers describing the grouping of the X-variables. ind.block.x <- c(3,10,15) means that X is structured into 4 groups: X1 to X3; X4 to X10, X11 to X15 and X16 to Xp where p is the number of variables in the X matrix.
keepX: numeric vector of length ncomp, the number of variables to keep in X-loadings. By default all variables are kept in the model.
alpha.x: The mixing parameter (value between 0 and 1) related to the sparsity within group for the X dataset.
upper.lambda: By default upper.lambda=10^5. A large value specifying the upper bound of the intervall of lambda values for searching the value of the tuning parameter (lambda) corresponding to a non-zero group of variables.
plot: Should the survival function be plotted ?)
allres: FALSE to return only the Cox model and TRUE for additionnal results. See details. Defaults to FALSE.
dataXplan: an optional data frame, list or environment (or object coercible by as.data.frame to a data frame) containing the variables in the model. If not found in dataXplan, the variables are taken from environment(Xplan), typically the environment from which coxpls is called.
subset: an optional vector specifying a subset of observations to be used in the fitting process.
weights: an optional vector of 'prior weights' to be used in the fitting process. Should be NULL or a numeric vector.
model_frame: If TRUE, the model frame is returned.
model_matrix: If TRUE, the model matrix is returned.
contrasts.arg: a list, whose entries are values (numeric matrices, functions or character strings naming functions) to be used as replacement values for the contrasts replacement function and whose names are the names of columns of data containing factors.

Value

If allres=FALSE :

cox_sgpls: Final Cox-model.

If allres=TRUE :

tt_sgpls: PLSR components.
cox_sgpls: Final Cox-model.
sgpls_mod: The PLSR model.

Details

If allres=FALSE returns only the final Cox-model. If allres=TRUE returns a list with the PLS components, the final Cox-model and the group PLSR model. allres=TRUE is useful for evluating model prediction accuracy on a test sample.

References

A group and Sparse Group Partial Least Square approach applied in Genomics context, Liquet Benoit, Lafaye de Micheaux, Boris Hejblum, Rodolphe Thiebaut (2016). Bioinformatics.

Deviance residuals-based sparse PLS and sparse kernel PLS regression for censored data, Philippe Bastien, Frederic Bertrand, Nicolas Meyer and Myriam Maumy-Bertrand (2015), Bioinformatics, 31(3):397-404, doi:10.1093/bioinformatics/btu660.

Author

Frédéric Bertrand
frederic.bertrand@lecnam.net
https://fbertran.github.io/homepage/

Examples


data(micro.censure)
data(Xmicro.censure_compl_imp)

X_train_micro <- apply((as.matrix(Xmicro.censure_compl_imp)),FUN="as.numeric",MARGIN=2)[1:80,]
X_train_micro_df <- data.frame(X_train_micro)
Y_train_micro <- micro.censure$survyear[1:80]
C_train_micro <- micro.censure$DC[1:80]

(coxsgpls_fit=coxsgpls(X_train_micro,Y_train_micro,C_train_micro,
ncomp=6,ind.block.x=c(3,10,15), alpha.x = rep(0.95, 6)))
#> Call:
#> coxph(formula = YCsurv ~ ., data = tt_sgpls)
#> 
#>          coef exp(coef) se(coef)      z      p
#> dim.1 -0.7429    0.4757   0.2647 -2.807 0.0050
#> dim.2 -0.4003    0.6701   0.2622 -1.527 0.1268
#> dim.3 -0.6329    0.5310   0.2930 -2.160 0.0308
#> dim.4 -0.5733    0.5637   0.2591 -2.213 0.0269
#> dim.5  0.1578    1.1709   0.2375  0.664 0.5064
#> dim.6 -0.2209    0.8018   0.3337 -0.662 0.5079
#> 
#> Likelihood ratio test=21.77  on 6 df, p=0.001331
#> n= 80, number of events= 17 
(coxsgpls_fit=coxsgpls(~X_train_micro,Y_train_micro,C_train_micro,
ncomp=6,ind.block.x=c(3,10,15), alpha.x = rep(0.95, 6)))
#> Call:
#> coxph(formula = YCsurv ~ ., data = tt_sgpls)
#> 
#>          coef exp(coef) se(coef)      z      p
#> dim.1 -0.7429    0.4757   0.2647 -2.807 0.0050
#> dim.2 -0.4003    0.6701   0.2622 -1.527 0.1268
#> dim.3 -0.6329    0.5310   0.2930 -2.160 0.0308
#> dim.4 -0.5733    0.5637   0.2591 -2.213 0.0269
#> dim.5  0.1578    1.1709   0.2375  0.664 0.5064
#> dim.6 -0.2209    0.8018   0.3337 -0.662 0.5079
#> 
#> Likelihood ratio test=21.77  on 6 df, p=0.001331
#> n= 80, number of events= 17 
(coxsgpls_fit=coxsgpls(~.,Y_train_micro,C_train_micro,ncomp=6,
dataXplan=X_train_micro_df,ind.block.x=c(3,10,15), alpha.x = rep(0.95, 6)))
#> Call:
#> coxph(formula = YCsurv ~ ., data = tt_sgpls)
#> 
#>          coef exp(coef) se(coef)      z      p
#> dim.1 -0.7429    0.4757   0.2647 -2.807 0.0050
#> dim.2 -0.4003    0.6701   0.2622 -1.527 0.1268
#> dim.3 -0.6329    0.5310   0.2930 -2.160 0.0308
#> dim.4 -0.5733    0.5637   0.2591 -2.213 0.0269
#> dim.5  0.1578    1.1709   0.2375  0.664 0.5064
#> dim.6 -0.2209    0.8018   0.3337 -0.662 0.5079
#> 
#> Likelihood ratio test=21.77  on 6 df, p=0.001331
#> n= 80, number of events= 17 

rm(X_train_micro,Y_train_micro,C_train_micro,cox_sgpls_sgfit)
#> Warning: object 'cox_sgpls_sgfit' not found