
Partial least squares regression beta models with kfold cross validation
Source:R/PLS_beta_kfoldcv.R
PLS_beta_kfoldcv.Rd
This function implements kfold cross validation on complete or incomplete datasets for partial least squares beta regression models
Usage
PLS_beta_kfoldcv(
dataY,
dataX,
nt = 2,
limQ2set = 0.0975,
modele = "pls",
family = NULL,
K = nrow(dataX),
NK = 1,
grouplist = NULL,
random = FALSE,
scaleX = TRUE,
scaleY = NULL,
keepcoeffs = FALSE,
keepfolds = FALSE,
keepdataY = TRUE,
keepMclassed = FALSE,
tol_Xi = 10^(-12),
weights,
method,
link = NULL,
link.phi = NULL,
type = "ML",
verbose = TRUE
)
Arguments
- dataY
response (training) dataset
- dataX
predictor(s) (training) dataset
- nt
number of components to be extracted
- limQ2set
limit value for the Q2
- modele
name of the PLS glm or PLS beta model to be fitted (
"pls"
,"pls-glm-Gamma"
,"pls-glm-gaussian"
,"pls-glm-inverse.gaussian"
,"pls-glm-logistic"
,"pls-glm-poisson"
,"pls-glm-polr"
,"pls-beta"
). Use"modele=pls-glm-family"
to enable thefamily
option.- family
a description of the error distribution and link function to be used in the model. This can be a character string naming a family function, a family function or the result of a call to a family function. (See
family
for details of family functions.) To use the family option, please setmodele="pls-glm-family"
. User defined families can also be defined. See details.- K
number of groups
- NK
number of times the group division is made
- grouplist
to specify the members of the
K
groups- random
should the
K
groups be made randomly- scaleX
scale the predictor(s) : must be set to TRUE for
modele="pls"
and should be for glms pls.- scaleY
scale the response : Yes/No. Ignored since non always possible for glm responses.
- keepcoeffs
shall the coefficients for each model be returned
- keepfolds
shall the groups' composition be returned
- keepdataY
shall the observed value of the response for each one of the predicted value be returned
- keepMclassed
shall the number of miss classed be returned (unavailable)
- tol_Xi
minimal value for Norm2(Xi) and \(\mathrm{det}(pp' \times pp)\) if there is any missing value in the
dataX
. It defaults to \(10^{-12}\)- weights
an optional vector of 'prior weights' to be used in the fitting process. Should be
NULL
or a numeric vector.- method
logistic, probit, complementary log-log or cauchit (corresponding to a Cauchy latent variable).
- link
character specification of the link function in the mean model (mu). Currently, "
logit
", "probit
", "cloglog
", "cauchit
", "log
", "loglog
" are supported. Alternatively, an object of class "link-glm
" can be supplied.- link.phi
character specification of the link function in the precision model (phi). Currently, "
identity
", "log
", "sqrt
" are supported. The default is "log
" unlessformula
is of typey~x
where the default is "identity
" (for backward compatibility). Alternatively, an object of class "link-glm
" can be supplied.- type
character specification of the type of estimator. Currently, maximum likelihood ("
ML
"), ML with bias correction ("BC
"), and ML with bias reduction ("BR
") are supported.- verbose
should info messages be displayed ?
Value
- results_kfolds
list of
NK
. Each element of the list sums up the results for a group division:- list
of
K
matrices of size aboutnrow(dataX)/K * nt
with the predicted values for a growing number of components- list()
...
- list
of
K
matrices of size aboutnrow(dataX)/K * nt
with the predicted values for a growing number of components
- folds
list of
NK
. Each element of the list sums up the informations for a group division:- list
of
K
vectors of length aboutnrow(dataX)
with the numbers of the rows ofdataX
that were used as a training set- list()
...
- list
of
K
vectors of length aboutnrow(dataX)
with the numbers of the rows ofdataX
that were used as a training set
- dataY_kfolds
list of
NK
. Each element of the list sums up the results for a group division:- list
of
K
matrices of size aboutnrow(dataX)/K * 1
with the observed values of the response- list()
...
- list
of
K
matrices of size aboutnrow(dataX)/K * 1
with the observed values of the response
- call
the call of the function
Details
Predicts 1 group with the K-1
other groups. Leave one out cross
validation is thus obtained for K==nrow(dataX)
.
There are seven different predefined models with predefined link functions available :
- list("\"pls\"")
ordinary pls models
- list("\"pls-glm-Gamma\"")
glm gaussian with inverse link pls models
- list("\"pls-glm-gaussian\"")
glm gaussian with identity link pls models
- list("\"pls-glm-inverse-gamma\"")
glm binomial with square inverse link pls models
- list("\"pls-glm-logistic\"")
glm binomial with logit link pls models
- list("\"pls-glm-poisson\"")
glm poisson with log link pls models
- list("\"pls-glm-polr\"")
glm polr with logit link pls models
Using the "family="
option and setting
"modele=pls-glm-family"
allows changing the family and link function
the same way as for the glm
function. As a consequence
user-specified families can also be used.
- The
accepts the links (as names)
identity
,log
andinverse
.- list("gaussian")
accepts the links (as names)
identity
,log
andinverse
.- family
accepts the links (as names)
identity
,log
andinverse
.- The
accepts the links
logit
,probit
,cauchit
, (corresponding to logistic, normal and Cauchy CDFs respectively)log
andcloglog
(complementary log-log).- list("binomial")
accepts the links
logit
,probit
,cauchit
, (corresponding to logistic, normal and Cauchy CDFs respectively)log
andcloglog
(complementary log-log).- family
accepts the links
logit
,probit
,cauchit
, (corresponding to logistic, normal and Cauchy CDFs respectively)log
andcloglog
(complementary log-log).- The
accepts the links
inverse
,identity
andlog
.- list("Gamma")
accepts the links
inverse
,identity
andlog
.- family
accepts the links
inverse
,identity
andlog
.- The
accepts the links
log
,identity
, andsqrt
.- list("poisson")
accepts the links
log
,identity
, andsqrt
.- family
accepts the links
log
,identity
, andsqrt
.- The
accepts the links
1/mu^2
,inverse
,identity
andlog
.- list("inverse.gaussian")
accepts the links
1/mu^2
,inverse
,identity
andlog
.- family
accepts the links
1/mu^2
,inverse
,identity
andlog
.- The
accepts the links
logit
,probit
,cloglog
,identity
,inverse
,log
,1/mu^2
andsqrt
.- list("quasi")
accepts the links
logit
,probit
,cloglog
,identity
,inverse
,log
,1/mu^2
andsqrt
.- family
accepts the links
logit
,probit
,cloglog
,identity
,inverse
,log
,1/mu^2
andsqrt
.- The function
can be used to create a power link function.
- list("power")
can be used to create a power link function.
Non-NULL weights can be used to indicate that different observations have different dispersions (with the values in weights being inversely proportional to the dispersions); or equivalently, when the elements of weights are positive integers w_i, that each response y_i is the mean of w_i unit-weight observations.
References
Frédéric Bertrand, Nicolas Meyer, Michèle Beau-Faller, Karim El Bayed, Izzie-Jacques Namer, Myriam Maumy-Bertrand (2013). Régression Bêta PLS. Journal de la Société Française de Statistique, 154(3):143-159. https://ojs-test.apps.ocp.math.cnrs.fr/index.php/J-SFdS/article/view/215
See also
kfolds2coeff
,
kfolds2Pressind
, kfolds2Press
,
kfolds2Mclassedind
,
kfolds2Mclassed
and
kfolds2CVinfos_beta
to extract and transform results
from kfold cross validation.
Author
Frédéric Bertrand
frederic.bertrand@lecnam.net
https://fbertran.github.io/homepage/
Examples
if (FALSE) { # \dontrun{
data("GasolineYield",package="betareg")
yGasolineYield <- GasolineYield$yield
XGasolineYield <- GasolineYield[,2:5]
bbb <- PLS_beta_kfoldcv(yGasolineYield,XGasolineYield,nt=3,modele="pls-beta")
kfolds2CVinfos_beta(bbb)
} # }