
Partial least squares regression beta models with kfold cross validation
Source:R/PLS_beta_kfoldcv.R
PLS_beta_kfoldcv.RdThis function implements kfold cross validation on complete or incomplete datasets for partial least squares beta regression models
Usage
PLS_beta_kfoldcv(
dataY,
dataX,
nt = 2,
limQ2set = 0.0975,
modele = "pls",
family = NULL,
K = nrow(dataX),
NK = 1,
grouplist = NULL,
random = FALSE,
scaleX = TRUE,
scaleY = NULL,
keepcoeffs = FALSE,
keepfolds = FALSE,
keepdataY = TRUE,
keepMclassed = FALSE,
tol_Xi = 10^(-12),
weights,
method,
link = NULL,
link.phi = NULL,
type = "ML",
verbose = TRUE
)Arguments
- dataY
response (training) dataset
- dataX
predictor(s) (training) dataset
- nt
number of components to be extracted
- limQ2set
limit value for the Q2
- modele
name of the PLS glm or PLS beta model to be fitted (
"pls","pls-glm-Gamma","pls-glm-gaussian","pls-glm-inverse.gaussian","pls-glm-logistic","pls-glm-poisson","pls-glm-polr","pls-beta"). Use"modele=pls-glm-family"to enable thefamilyoption.- family
a description of the error distribution and link function to be used in the model. This can be a character string naming a family function, a family function or the result of a call to a family function. (See
familyfor details of family functions.) To use the family option, please setmodele="pls-glm-family". User defined families can also be defined. See details.- K
number of groups
- NK
number of times the group division is made
- grouplist
to specify the members of the
Kgroups- random
should the
Kgroups be made randomly- scaleX
scale the predictor(s) : must be set to TRUE for
modele="pls"and should be for glms pls.- scaleY
scale the response : Yes/No. Ignored since non always possible for glm responses.
- keepcoeffs
shall the coefficients for each model be returned
- keepfolds
shall the groups' composition be returned
- keepdataY
shall the observed value of the response for each one of the predicted value be returned
- keepMclassed
shall the number of miss classed be returned (unavailable)
- tol_Xi
minimal value for Norm2(Xi) and \(\mathrm{det}(pp' \times pp)\) if there is any missing value in the
dataX. It defaults to \(10^{-12}\)- weights
an optional vector of 'prior weights' to be used in the fitting process. Should be
NULLor a numeric vector.- method
logistic, probit, complementary log-log or cauchit (corresponding to a Cauchy latent variable).
- link
character specification of the link function in the mean model (mu). Currently, "
logit", "probit", "cloglog", "cauchit", "log", "loglog" are supported. Alternatively, an object of class "link-glm" can be supplied.- link.phi
character specification of the link function in the precision model (phi). Currently, "
identity", "log", "sqrt" are supported. The default is "log" unlessformulais of typey~xwhere the default is "identity" (for backward compatibility). Alternatively, an object of class "link-glm" can be supplied.- type
character specification of the type of estimator. Currently, maximum likelihood ("
ML"), ML with bias correction ("BC"), and ML with bias reduction ("BR") are supported.- verbose
should info messages be displayed ?
Value
- results_kfolds
list of
NK. Each element of the list sums up the results for a group division:- list
of
Kmatrices of size aboutnrow(dataX)/K * ntwith the predicted values for a growing number of components- list()
...
- list
of
Kmatrices of size aboutnrow(dataX)/K * ntwith the predicted values for a growing number of components
- folds
list of
NK. Each element of the list sums up the informations for a group division:- list
of
Kvectors of length aboutnrow(dataX)with the numbers of the rows ofdataXthat were used as a training set- list()
...
- list
of
Kvectors of length aboutnrow(dataX)with the numbers of the rows ofdataXthat were used as a training set
- dataY_kfolds
list of
NK. Each element of the list sums up the results for a group division:- list
of
Kmatrices of size aboutnrow(dataX)/K * 1with the observed values of the response- list()
...
- list
of
Kmatrices of size aboutnrow(dataX)/K * 1with the observed values of the response
- call
the call of the function
Details
Predicts 1 group with the K-1 other groups. Leave one out cross
validation is thus obtained for K==nrow(dataX).
There are seven different predefined models with predefined link functions available :
- list("\"pls\"")
ordinary pls models
- list("\"pls-glm-Gamma\"")
glm gaussian with inverse link pls models
- list("\"pls-glm-gaussian\"")
glm gaussian with identity link pls models
- list("\"pls-glm-inverse-gamma\"")
glm binomial with square inverse link pls models
- list("\"pls-glm-logistic\"")
glm binomial with logit link pls models
- list("\"pls-glm-poisson\"")
glm poisson with log link pls models
- list("\"pls-glm-polr\"")
glm polr with logit link pls models
Using the "family=" option and setting
"modele=pls-glm-family" allows changing the family and link function
the same way as for the glm function. As a consequence
user-specified families can also be used.
- The
accepts the links (as names)
identity,logandinverse.- list("gaussian")
accepts the links (as names)
identity,logandinverse.- family
accepts the links (as names)
identity,logandinverse.- The
accepts the links
logit,probit,cauchit, (corresponding to logistic, normal and Cauchy CDFs respectively)logandcloglog(complementary log-log).- list("binomial")
accepts the links
logit,probit,cauchit, (corresponding to logistic, normal and Cauchy CDFs respectively)logandcloglog(complementary log-log).- family
accepts the links
logit,probit,cauchit, (corresponding to logistic, normal and Cauchy CDFs respectively)logandcloglog(complementary log-log).- The
accepts the links
inverse,identityandlog.- list("Gamma")
accepts the links
inverse,identityandlog.- family
accepts the links
inverse,identityandlog.- The
accepts the links
log,identity, andsqrt.- list("poisson")
accepts the links
log,identity, andsqrt.- family
accepts the links
log,identity, andsqrt.- The
accepts the links
1/mu^2,inverse,identityandlog.- list("inverse.gaussian")
accepts the links
1/mu^2,inverse,identityandlog.- family
accepts the links
1/mu^2,inverse,identityandlog.- The
accepts the links
logit,probit,cloglog,identity,inverse,log,1/mu^2andsqrt.- list("quasi")
accepts the links
logit,probit,cloglog,identity,inverse,log,1/mu^2andsqrt.- family
accepts the links
logit,probit,cloglog,identity,inverse,log,1/mu^2andsqrt.- The function
can be used to create a power link function.
- list("power")
can be used to create a power link function.
Non-NULL weights can be used to indicate that different observations have different dispersions (with the values in weights being inversely proportional to the dispersions); or equivalently, when the elements of weights are positive integers w_i, that each response y_i is the mean of w_i unit-weight observations.
References
Frédéric Bertrand, Nicolas Meyer, Michèle Beau-Faller, Karim El Bayed, Izzie-Jacques Namer, Myriam Maumy-Bertrand (2013). Régression Bêta PLS. Journal de la Société Française de Statistique, 154(3):143-159. https://ojs-test.apps.ocp.math.cnrs.fr/index.php/J-SFdS/article/view/215
See also
kfolds2coeff,
kfolds2Pressind, kfolds2Press,
kfolds2Mclassedind,
kfolds2Mclassed and
kfolds2CVinfos_beta to extract and transform results
from kfold cross validation.
Author
Frédéric Bertrand
frederic.bertrand@lecnam.net
https://fbertran.github.io/homepage/
Examples
if (FALSE) { # \dontrun{
data("GasolineYield",package="betareg")
yGasolineYield <- GasolineYield$yield
XGasolineYield <- GasolineYield[,2:5]
bbb <- PLS_beta_kfoldcv(yGasolineYield,XGasolineYield,nt=3,modele="pls-beta")
kfolds2CVinfos_beta(bbb)
} # }