Variable selection functions (all)

Compute coefficient vector after variable selection for the fitting criteria of a given model. May be used for a step by step use of Selectboost.

lasso_msgps_all(X, Y, penalty = "enet")

enet_msgps_all(X, Y, penalty = "enet", alpha = 0.5)

alasso_msgps_all(X, Y, penalty = "alasso")

alasso_enet_msgps_all(X, Y, penalty = "alasso", alpha = 0.5)

lasso_cv_glmnet_all_5f(X, Y)

spls_spls_all(X, Y, K.seq = c(1:5), eta.seq = (1:9)/10, fold.val = 5)

varbvs_linear_all(X, Y, include.threshold.list = (1:19)/20)

lasso_cv_glmnet_bin_all(X, Y)

lasso_glmnet_bin_all(X, Y)

splsda_spls_all(X, Y, K.seq = c(1:10), eta.seq = (1:9)/10)

sgpls_spls_all(X, Y, K.seq = c(1:10), eta.seq = (1:9)/10)

varbvs_binomial_all(X, Y, include.threshold.list = (1:19)/20)

Arguments

X: A numeric matrix. The predictors matrix.
Y: A binary factor. The 0/1 classification response.
penalty: A character value to select the penalty term in msgps (Model Selection Criteria via Generalized Path Seeking). Defaults to "enet". "genet" is the generalized elastic net and "alasso" is the adaptive lasso, which is a weighted version of the lasso.
alpha: A numeric value to set the value of \(\alpha\) on "enet" and "genet" penalty in msgps (Model Selection Criteria via Generalized Path Seeking).
K.seq: A numeric vector. Number of components to test.
eta.seq: A numeric vector. Eta sequence to test.
fold.val: A numeric value. Number of folds to use.
include.threshold.list: A numeric vector. Vector of threshold to use.
K: A numeric value. Number of folds to use.

Value

A vector or matrix of coefficients.

Details

lasso_msgps_all returns the matrix of coefficients for an optimal linear model estimated by the LASSO estimator and selected by model selection criteria including Mallows' Cp, bias-corrected AIC (AICc), generalized cross validation (GCV) and BIC. The the msgps function of the msgps package implements Model Selection Criteria via Generalized Path Seeking to compute the degrees of freedom of the LASSO.

enet_msgps_all returns the matrix of coefficients for an optimal linear model estimated by the ELASTIC NET estimator and selected by model selection criteria including Mallows' Cp, bias-corrected AIC (AICc), generalized cross validation (GCV) and BIC. The the msgps function of the msgps package implements Model Selection Criteria via Generalized Path Seeking to compute the degrees of freedom of the ELASTIC NET.

alasso_msgps_all returns the matrix of coefficients for an optimal linear model estimated by the adaptive LASSO estimator and selected by model selection criteria including Mallows' Cp, bias-corrected AIC (AICc), generalized cross validation (GCV) and BIC. The the msgps function of the msgps package implements Model Selection Criteria via Generalized Path Seeking to compute the degrees of freedom of the adaptive LASSO.

alasso_enet_msgps_all returns the matrix of coefficients for an optimal linear model estimated by the adaptive ELASTIC NET estimator and selected by model selection criteria including Mallows' Cp, bias-corrected AIC (AICc), generalized cross validation (GCV) and BIC. The the msgps function of the msgps package implements Model Selection Criteria via Generalized Path Seeking to compute the degrees of freedom of the adaptive ELASTIC NET.

lasso_cv_glmnet_all_5f returns the matrix of coefficients for a linear model estimated by the LASSO using the lambda.min and lambda.1se (lambda.min+1se) values computed by 5 fold cross validation. It uses the glmnet and cv.glmnet functions of the glmnet package.

spls_spls_all returns the matrix of the raw (coef.spls) and correct.spls and bootstrap corrected coefficients for a linear model estimated by the SPLS (sparse partial least squares) and 5 fold cross validation. It uses the spls, cv.spls, ci.spls, coef.spls and correct.spls functions of the spls package.

varbvs_linear_all returns the matrix of the coefficients for a linear model estimated by the varbvs (variational approximation for Bayesian variable selection in linear regression, family = gaussian) and the requested threshold values. It uses the varbvs, coef and variable.names functions of the varbvs package.

lasso_cv_glmnet_bin_all returns the matrix of coefficients for a logistic model estimated by the LASSO using the lambda.min and lambda.1se (lambda.min+1se) values computed by 5 fold cross validation. It uses the glmnet and cv.glmnet functions of the glmnet package.

lasso_glmnet_bin_all returns the matrix of coefficients for a logistic model estimated by the LASSO using the AICc_glmnetB and BIC_glmnetB information criteria. It uses the glmnet function of the glmnet package and the AICc_glmnetB and BIC_glmnetB functions of the SelectBoost package that were adapted from the AICc_glmnetB and BIC_glmnetB functions of the rLogistic (https://github.com/echi/rLogistic) package.

splsda_spls_all returns the matrix of the raw (coef.splsda) coefficients for logistic regression model estimated by the SGPLS (sparse généralized partial least squares) and 5 fold cross validation. It uses the splsda, cv.splsda and coef.splsda functions of the sgpls package.

sgpls_spls_all returns the matrix of the raw (coef.sgpls) coefficients for logistic regression model estimated by the SGPLS (sparse généralized partial least squares) and 5 fold cross validation. It uses the sgpls, cv.sgpls and coef.sgpls functions of the sgpls package.

varbvs_binomial_all returns the matrix of the coefficients for a linear model estimated by the varbvs (variational approximation for Bayesian variable selection in logistic regression, family = binomial) and the requested threshold values. It uses the varbvs, coef and variable.names functions of the varbvs package.

References

selectBoost: a general algorithm to enhance the performance of variable selection methods in correlated datasets, Frédéric Bertrand, Ismaïl Aouadi, Nicolas Jung, Raphael Carapito, Laurent Vallat, Seiamak Bahram, Myriam Maumy-Bertrand, Bioinformatics, 2020. doi:10.1093/bioinformatics/btaa855

Author

Frederic Bertrand, frederic.bertrand@utt.fr

Examples

set.seed(314)
xran <- matrix(rnorm(100*6),100,6)
beta0 <- c(3,1.5,0,0,2,0)
epsilon <- rnorm(100,sd=3)
yran <- c(xran %*% beta0 + epsilon)
ybin <- ifelse(yran>=0,1,0)
set.seed(314)
lasso_msgps_all(xran,yran)
#>          Cp     AICc      GCV      BIC
#> V1 2.888389 2.888389 2.888389 2.822453
#> V2 1.240134 1.240134 1.240134 1.158685
#> V3 0.294536 0.294536 0.294536 0.197723
#> V4 0.000000 0.000000 0.000000 0.000000
#> V5 1.488073 1.488073 1.488073 1.396131
#> V6 0.000000 0.000000 0.000000 0.000000

set.seed(314)
enet_msgps_all(xran,yran)
#>          Cp     AICc      GCV      BIC
#> V1 2.941541 2.937504 2.941541 2.814043
#> V2 1.442444 1.440943 1.442444 1.393650
#> V3 0.516834 0.515345 0.516834 0.472523
#> V4 0.000000 0.000000 0.000000 0.000000
#> V5 1.629282 1.626506 1.629282 1.540810
#> V6 0.225360 0.224630 0.225360 0.200158

set.seed(314)
alasso_msgps_all(xran,yran)
#>          Cp     AICc      GCV      BIC
#> V1 3.007477 3.006131 3.007813 2.968790
#> V2 1.309948 1.305444 1.311450 1.172197
#> V3 0.278524 0.269215 0.281131 0.000000
#> V4 0.000000 0.000000 0.000000 0.000000
#> V5 1.602220 1.598403 1.603261 1.487032
#> V6 0.000000 0.000000 0.000000 0.000000

set.seed(314)
alasso_enet_msgps_all(xran,yran)
#>          Cp     AICc      GCV      BIC
#> V1 3.007477 3.006131 3.007813 2.968790
#> V2 1.309948 1.305444 1.311450 1.172197
#> V3 0.278524 0.269215 0.281131 0.000000
#> V4 0.000000 0.000000 0.000000 0.000000
#> V5 1.602220 1.598403 1.603261 1.487032
#> V6 0.000000 0.000000 0.000000 0.000000

set.seed(314)
lasso_cv_glmnet_all_5f(xran,yran)
#>      lambda.min lambda.1se
#> [1,]  3.0265634  2.5914804
#> [2,]  1.4632565  0.8834199
#> [3,]  0.5351174  0.0000000
#> [4,]  0.0000000  0.0000000
#> [5,]  1.6827459  1.0760789
#> [6,]  0.2299210  0.0000000

set.seed(314)
spls_spls_all(xran,yran)
#> eta = 0.1 
#> eta = 0.2 
#> eta = 0.3 
#> eta = 0.4 
#> eta = 0.5 
#> eta = 0.6 
#> eta = 0.7 
#> eta = 0.8 
#> eta = 0.9 
#> 
#> Optimal parameters: eta = 0.9, K = 5
#> 10 % completed...
#> 20 % completed...
#> 30 % completed...
#> 40 % completed...
#> 50 % completed...
#> 60 % completed...
#> 70 % completed...
#> 80 % completed...
#> 90 % completed...
#> 100 % completed...
#>      raw_coefs_K.opt_5_eta.opt_0.9
#> [1,]                     3.0415349
#> [2,]                     1.3270025
#> [3,]                     0.4983083
#> [4,]                     0.0000000
#> [5,]                     1.6470041
#> [6,]                     0.2256989
#>      bootstrap_corrected_coefs_K.opt_5_eta.opt_0.9
#> [1,]                                      3.041535
#> [2,]                                      1.327003
#> [3,]                                      0.000000
#> [4,]                                      0.000000
#> [5,]                                      1.647004
#> [6,]                                      0.000000

set.seed(314)
varbvs_linear_all(xran,yran)
#>    coef_varbvs_0.05 coef_varbvs_0.1 coef_varbvs_0.15 coef_varbvs_0.2
#> X1         3.016354        3.016354         3.016354        3.016354
#> X2         1.282221        1.282221         1.282221        1.282221
#> X3         0.000000        0.000000         0.000000        0.000000
#> X4         0.000000        0.000000         0.000000        0.000000
#> X5         1.645789        1.645789         1.645789        1.645789
#> X6         0.000000        0.000000         0.000000        0.000000
#>    coef_varbvs_0.25 coef_varbvs_0.3 coef_varbvs_0.35 coef_varbvs_0.4
#> X1         3.016354        3.016354         3.016354        3.016354
#> X2         1.282221        1.282221         1.282221        1.282221
#> X3         0.000000        0.000000         0.000000        0.000000
#> X4         0.000000        0.000000         0.000000        0.000000
#> X5         1.645789        1.645789         1.645789        1.645789
#> X6         0.000000        0.000000         0.000000        0.000000
#>    coef_varbvs_0.45 coef_varbvs_0.5 coef_varbvs_0.55 coef_varbvs_0.6
#> X1         3.016354        3.016354         3.016354        3.016354
#> X2         1.282221        1.282221         1.282221        1.282221
#> X3         0.000000        0.000000         0.000000        0.000000
#> X4         0.000000        0.000000         0.000000        0.000000
#> X5         1.645789        1.645789         1.645789        1.645789
#> X6         0.000000        0.000000         0.000000        0.000000
#>    coef_varbvs_0.65 coef_varbvs_0.7 coef_varbvs_0.75 coef_varbvs_0.8
#> X1         3.016354        3.016354         3.016354        3.016354
#> X2         1.282221        1.282221         1.282221        1.282221
#> X3         0.000000        0.000000         0.000000        0.000000
#> X4         0.000000        0.000000         0.000000        0.000000
#> X5         1.645789        1.645789         1.645789        1.645789
#> X6         0.000000        0.000000         0.000000        0.000000
#>    coef_varbvs_0.85 coef_varbvs_0.9 coef_varbvs_0.95
#> X1         3.016354        3.016354         3.016354
#> X2         1.282221        1.282221         0.000000
#> X3         0.000000        0.000000         0.000000
#> X4         0.000000        0.000000         0.000000
#> X5         1.645789        1.645789         1.645789
#> X6         0.000000        0.000000         0.000000

set.seed(314)
lasso_cv_glmnet_bin_all(xran,ybin)
#>      lambda.min lambda.1se
#> [1,]  1.0500913  0.9854475
#> [2,]  0.3353787  0.2834710
#> [3,]  0.1756045  0.1187688
#> [4,]  0.0000000  0.0000000
#> [5,]  0.5707957  0.5097384
#> [6,]  0.0000000  0.0000000

set.seed(314)
lasso_glmnet_bin_all(xran,ybin)
#>          AICc      BIC
#> [1,] 1.345775 1.345775
#> [2,] 0.000000 0.000000
#> [3,] 0.000000 0.000000
#> [4,] 0.000000 0.000000
#> [5,] 0.000000 0.000000
#> [6,] 0.000000 0.000000

set.seed(314)
# \donttest{
splsda_spls_all(xran,ybin, K.seq=1:3)
#> 
#> Optimal parameters: eta = 0.9, K = 3
#>    raw_coefs_K.opt_3_eta.opt_0.9
#> x1                     0.9729799
#> x2                     0.4145599
#> x3                     0.3216531
#> x4                     0.0000000
#> x5                     0.6557821
#> x6                     0.0000000
# }

set.seed(314)
# \donttest{
sgpls_spls_all(xran,ybin, K.seq=1:3)
#> 
#> Optimal parameters: eta = 0.4, K = 3
#>    raw_coefs_K.opt_3_eta.opt_0.4
#> x1                    0.52287257
#> x2                    0.23647477
#> x3                    0.16705245
#> x4                   -0.05893075
#> x5                    0.33188549
#> x6                    0.03978646
# }

set.seed(314)
varbvs_binomial_all(xran,ybin)
#>    coef_varbvs_0.05 coef_varbvs_0.1 coef_varbvs_0.15 coef_varbvs_0.2
#> X1       1.35696371       1.3569637        1.3569637       1.3569637
#> X2       0.02417606       0.0000000        0.0000000       0.0000000
#> X3       0.00000000       0.0000000        0.0000000       0.0000000
#> X4       0.00000000       0.0000000        0.0000000       0.0000000
#> X5       0.57264860       0.5726486        0.5726486       0.5726486
#> X6       0.00000000       0.0000000        0.0000000       0.0000000
#>    coef_varbvs_0.25 coef_varbvs_0.3 coef_varbvs_0.35 coef_varbvs_0.4
#> X1        1.3569637       1.3569637        1.3569637       1.3569637
#> X2        0.0000000       0.0000000        0.0000000       0.0000000
#> X3        0.0000000       0.0000000        0.0000000       0.0000000
#> X4        0.0000000       0.0000000        0.0000000       0.0000000
#> X5        0.5726486       0.5726486        0.5726486       0.5726486
#> X6        0.0000000       0.0000000        0.0000000       0.0000000
#>    coef_varbvs_0.45 coef_varbvs_0.5 coef_varbvs_0.55 coef_varbvs_0.6
#> X1        1.3569637       1.3569637        1.3569637       1.3569637
#> X2        0.0000000       0.0000000        0.0000000       0.0000000
#> X3        0.0000000       0.0000000        0.0000000       0.0000000
#> X4        0.0000000       0.0000000        0.0000000       0.0000000
#> X5        0.5726486       0.5726486        0.5726486       0.5726486
#> X6        0.0000000       0.0000000        0.0000000       0.0000000
#>    coef_varbvs_0.65 coef_varbvs_0.7 coef_varbvs_0.75 coef_varbvs_0.8
#> X1        1.3569637       1.3569637        1.3569637        1.356964
#> X2        0.0000000       0.0000000        0.0000000        0.000000
#> X3        0.0000000       0.0000000        0.0000000        0.000000
#> X4        0.0000000       0.0000000        0.0000000        0.000000
#> X5        0.5726486       0.5726486        0.5726486        0.000000
#> X6        0.0000000       0.0000000        0.0000000        0.000000
#>    coef_varbvs_0.85 coef_varbvs_0.9 coef_varbvs_0.95
#> X1         1.356964        1.356964         1.356964
#> X2         0.000000        0.000000         0.000000
#> X3         0.000000        0.000000         0.000000
#> X4         0.000000        0.000000         0.000000
#> X5         0.000000        0.000000         0.000000
#> X6         0.000000        0.000000         0.000000