Fitting a Direct Kernel group sparse PLS model on the (Deviance) Residuals
Source:R/coxDKsgplsDR.R
coxDKsgplsDR.RdThis function computes the Cox Model based on PLSR components computed model with
as explanatory variables: Xplan.
It uses the package sgplsDR to perform group PLSR
fit.
Usage
coxDKsgplsDR(Xplan, ...)
# S3 method for class 'formula'
coxDKsgplsDR(
Xplan,
time,
time2,
event,
type,
origin,
typeres = "deviance",
collapse,
weighted,
scaleX = TRUE,
scaleY = TRUE,
ncomp = min(7, ncol(Xplan)),
modepls = "regression",
ind.block.x,
keepX,
alpha.x,
upper.lambda = 10^5,
plot = FALSE,
allres = FALSE,
dataXplan = NULL,
subset,
weights,
model_frame = FALSE,
model_matrix = FALSE,
contrasts.arg = NULL,
kernel = "rbfdot",
hyperkernel,
verbose = FALSE,
...
)
# Default S3 method
coxDKsgplsDR(
Xplan,
time,
time2,
event,
type,
origin,
typeres = "deviance",
collapse,
weighted,
scaleX = TRUE,
scaleY = TRUE,
ncomp = min(7, ncol(Xplan)),
modepls = "regression",
ind.block.x,
keepX,
alpha.x,
upper.lambda = 10^5,
plot = FALSE,
allres = FALSE,
kernel = "rbfdot",
hyperkernel,
verbose = FALSE,
...
)Arguments
- Xplan
a formula or a matrix with the eXplanatory variables (training) dataset
- ...
Arguments to be passed on to
survival::coxph.- time
for right censored data, this is the follow up time. For interval data, the first argument is the starting time for the interval.
- time2
The status indicator, normally 0=alive, 1=dead. Other choices are
TRUE/FALSE(TRUE= death) or 1/2 (2=death). For interval censored data, the status indicator is 0=right censored, 1=event attime, 2=left censored, 3=interval censored. Although unusual, the event indicator can be omitted, in which case all subjects are assumed to have an event.- event
ending time of the interval for interval censored or counting process data only. Intervals are assumed to be open on the left and closed on the right,
(start, end]. For counting process data, event indicates whether an event occurred at the end of the interval.- type
character string specifying the type of censoring. Possible values are
"right","left","counting","interval", or"interval2". The default is"right"or"counting"depending on whether thetime2argument is absent or present, respectively.- origin
for counting process data, the hazard function origin. This option was intended to be used in conjunction with a model containing time dependent strata in order to align the subjects properly when they cross over from one strata to another, but it has rarely proven useful.
- typeres
character string indicating the type of residual desired. Possible values are
"martingale","deviance","score","schoenfeld","dfbeta","dfbetas", and"scaledsch". Only enough of the string to determine a unique match is required.- collapse
vector indicating which rows to collapse (sum) over. In time-dependent models more than one row data can pertain to a single individual. If there were 4 individuals represented by 3, 1, 2 and 4 rows of data respectively, then
collapse=c(1,1,1,2,3,3,4,4,4,4)could be used to obtain per subject rather than per observation residuals.- weighted
if
TRUEand the model was fit with case weights, then the weighted residuals are returned.- scaleX
Should the
Xplancolumns be standardized ?- scaleY
Should the
timevalues be standardized ?- ncomp
The number of components to include in the model. It this is not supplied, min(7,maximal number) components is used.
- modepls
character string. What type of algorithm to use, (partially) matching one of "regression", "canonical". See
gPLSfor details- ind.block.x
a vector of integers describing the grouping of the X-variables.
ind.block.x <- c(3,10,15)means thatXis structured into 4 groups:X1toX3;X4toX10,X11toX15andX16toXpwherepis the number of variables in the X matrix.- keepX
numeric vector of length ncomp, the number of variables to keep in X-loadings. By default all variables are kept in the model.
- alpha.x
The mixing parameter (value between 0 and 1) related to the sparsity within group for the X dataset.
- upper.lambda
By default
upper.lambda=10^5. A large value specifying the upper bound of the intervall of lambda values for searching the value of the tuning parameter (lambda) corresponding to a non-zero group of variables.- plot
Should the survival function be plotted ?)
- allres
FALSE to return only the Cox model and TRUE for additionnal results. See details. Defaults to FALSE.
- dataXplan
an optional data frame, list or environment (or object coercible by
as.data.frameto a data frame) containing the variables in the model. If not found indataXplan, the variables are taken fromenvironment(Xplan), typically the environment from whichcoxplsis called.- subset
an optional vector specifying a subset of observations to be used in the fitting process.
- weights
an optional vector of 'prior weights' to be used in the fitting process. Should be
NULLor a numeric vector.- model_frame
If
TRUE, the model frame is returned.- model_matrix
If
TRUE, the model matrix is returned.- contrasts.arg
a list, whose entries are values (numeric matrices, functions or character strings naming functions) to be used as replacement values for the contrasts replacement function and whose names are the names of columns of data containing factors.
- kernel
the kernel function used in training and predicting. This parameter can be set to any function, of class kernel, which computes the inner product in feature space between two vector arguments (see kernels). The
kernlabpackage provides the most popular kernel functions which can be used by setting the kernel parameter to the following strings:- list("rbfdot")
Radial Basis kernel "Gaussian"
- list("polydot")
Polynomial kernel
- list("vanilladot")
Linear kernel
- list("tanhdot")
Hyperbolic tangent kernel
- list("laplacedot")
Laplacian kernel
- list("besseldot")
Bessel kernel
- list("anovadot")
ANOVA RBF kernel
- list("splinedot")
Spline kernel
- hyperkernel
the list of hyper-parameters (kernel parameters). This is a list which contains the parameters to be used with the kernel function. For valid parameters for existing kernels are :
sigma, inverse kernel width for the Radial Basis kernel function "rbfdot" and the Laplacian kernel "laplacedot".degree,scale,offsetfor the Polynomial kernel "polydot".scale, offset for the Hyperbolic tangent kernel function "tanhdot".sigma,order,degreefor the Bessel kernel "besseldot".sigma,degreefor the ANOVA kernel "anovadot".
In the case of a Radial Basis kernel function (Gaussian) or Laplacian kernel, if
hyperkernelis missing, the heuristics in sigest are used to calculate a good sigma value from the data.- verbose
Should some details be displayed ?
Value
If allres=FALSE :
- cox_DKsgplsDR
Final Cox-model.
If
allres=TRUE :
- tt_DKsgplsDR
PLSR components.
- cox_DKsgplsDR
Final Cox-model.
- DKsgplsDR_mod
The PLSR model.
Details
If allres=FALSE returns only the final Cox-model. If
allres=TRUE returns a list with the PLS components, the final
Cox-model and the group PLSR model. allres=TRUE is useful for evluating
model prediction accuracy on a test sample.
References
A group and Sparse Group Partial Least Square approach applied
in Genomics context, Liquet Benoit, Lafaye de Micheaux, Boris Hejblum,
Rodolphe Thiebaut (2016). Bioinformatics.
Deviance residuals-based sparse PLS and sparse kernel PLS regression for censored data, Philippe Bastien, Frederic Bertrand, Nicolas Meyer and Myriam Maumy-Bertrand (2015), Bioinformatics, 31(3):397-404, doi:10.1093/bioinformatics/btu660.
Author
Frédéric Bertrand
frederic.bertrand@lecnam.net
https://fbertran.github.io/homepage/
Examples
data(micro.censure)
data(Xmicro.censure_compl_imp)
X_train_micro <- apply((as.matrix(Xmicro.censure_compl_imp)),
FUN="as.numeric",MARGIN=2)[1:80,]
X_train_micro_df <- data.frame(X_train_micro)
Y_train_micro <- micro.censure$survyear[1:80]
C_train_micro <- micro.censure$DC[1:80]
(coxDKsgplsDR_fit=coxDKsgplsDR(X_train_micro,Y_train_micro,C_train_micro,
ncomp=6,ind.block.x=c(3,10,15), alpha.x = rep(0.95, 6)))
#> Call:
#> coxph(formula = YCsurv ~ ., data = tt_DKsgplsDR)
#>
#> coef exp(coef) se(coef) z p
#> dim.1 3.304e+00 2.722e+01 1.201e+00 2.750 0.005952
#> dim.2 9.157e+00 9.485e+03 2.713e+00 3.375 0.000738
#> dim.3 7.359e+00 1.571e+03 2.569e+00 2.865 0.004171
#> dim.4 1.400e+01 1.205e+06 4.486e+00 3.121 0.001802
#> dim.5 3.508e+00 3.340e+01 2.117e+00 1.657 0.097454
#> dim.6 7.349e+00 1.555e+03 3.073e+00 2.391 0.016792
#>
#> Likelihood ratio test=65.26 on 6 df, p=3.816e-12
#> n= 80, number of events= 17
(coxDKsgplsDR_fit=coxDKsgplsDR(~X_train_micro,Y_train_micro,C_train_micro,
ncomp=6,ind.block.x=c(3,10,15), alpha.x = rep(0.95, 6)))
#> Call:
#> coxph(formula = YCsurv ~ ., data = tt_DKsgplsDR)
#>
#> coef exp(coef) se(coef) z p
#> dim.1 3.370e+00 2.907e+01 1.251e+00 2.693 0.007078
#> dim.2 9.715e+00 1.657e+04 2.910e+00 3.339 0.000841
#> dim.3 8.049e+00 3.132e+03 2.951e+00 2.728 0.006378
#> dim.4 1.375e+01 9.407e+05 4.486e+00 3.066 0.002167
#> dim.5 4.089e+00 5.966e+01 2.141e+00 1.910 0.056141
#> dim.6 6.545e+00 6.958e+02 2.833e+00 2.311 0.020854
#>
#> Likelihood ratio test=66.51 on 6 df, p=2.118e-12
#> n= 80, number of events= 17
(coxDKsgplsDR_fit=coxDKsgplsDR(~.,Y_train_micro,C_train_micro,ncomp=6,
dataXplan=X_train_micro_df,ind.block.x=c(3,10,15), alpha.x = rep(0.95, 6)))
#> Call:
#> coxph(formula = YCsurv ~ ., data = tt_DKsgplsDR)
#>
#> coef exp(coef) se(coef) z p
#> dim.1 3.388e+00 2.960e+01 1.230e+00 2.754 0.005885
#> dim.2 9.482e+00 1.313e+04 2.797e+00 3.391 0.000697
#> dim.3 7.733e+00 2.283e+03 2.676e+00 2.890 0.003851
#> dim.4 1.437e+01 1.744e+06 4.589e+00 3.132 0.001738
#> dim.5 3.407e+00 3.018e+01 2.077e+00 1.641 0.100869
#> dim.6 7.246e+00 1.402e+03 2.954e+00 2.453 0.014157
#>
#> Likelihood ratio test=65.86 on 6 df, p=2.879e-12
#> n= 80, number of events= 17
rm(X_train_micro,Y_train_micro,C_train_micro,coxDKsgplsDR_fit)