
Model selection for Princinpal Components regression based on cross-validation
Source:R/pcr.cv.R
pcr.cv.RdThis function computes the optimal model parameter using cross-validation. Mdel selection is based on mean squared error and correlation to the response, respectively.
Arguments
- X
matrix of predictor observations.
- y
vector of response observations. The length of
yis the same as the number of rows ofX.- k
number of cross-validation splits. Default is 10.
- m
maximal number of principal components. Default is
m=min(ncol(X),nrow(X)-1).- groups
an optional vector with the same length as
y. It encodes a partitioning of the data into distinct subgroups. Ifgroupsis provided,k=10is ignored and instead, cross-validation is performed based on the partioning. Default isNULL.- scale
Should the predictor variables be scaled to unit variance? Default is
TRUE.- eps
precision. Eigenvalues of the correlation matrix of
Xthat are smaller thanepsare set to 0. The default value iseps=10^{-6}.- plot.it
Logical. If
TRUE, the function plots the cross-validation-error as a function of the number of components. Default isFALSE.- compute.jackknife
Logical. If
TRUE, the regression coefficients on each of the cross-validation splits is stored. Default isTRUE.- method.cor
How should the correlation to the response be computed? Default is ”pearson”.
- supervised
Should the principal components be sorted by decreasing squared correlation to the response? Default is FALSE.
Value
- cv.error.matrix
matrix of cross-validated errors based on mean squared error. A row corresponds to one cross-validation split.
- cv.error
vector of cross-validated errors based on mean squared error
- m.opt
optimal number of components based on mean squared error
- intercept
intercept of the optimal model, based on mean squared error
- coefficients
vector of regression coefficients of the optimal model, based on mean squared error
- cor.error.matrix
matrix of cross-validated errors based on correlation. A row corresponds to one cross-validation split.
- cor.error
vector of cross-validated errors based on correlation
- m.opt.cor
optimal number of components based on correlation
- intercept.cor
intercept of the optimal model, based on correlation
- coefficients.cor
vector of regression coefficients of the optimal model, based on correlation
- coefficients.jackknife
Array of the regression coefficients on each of the cross-validation splits, if
compute.jackknife=TRUE. In this case, the dimension isncol(X) x (m+1) x k.
Details
The function computes the principal components on the scaled predictors.
Based on the regression coefficients coefficients.jackknife computed
on the cross-validation splits, we can estimate their mean and their
variance using the jackknife. We remark that under a fixed design and the
assumption of normally distributed y-values, we can also derive the
true distribution of the regression coefficients.