Kernel and Streaming PLS Methods in bigPLSR
Frédéric Bertrand
Cedric, Cnam, Parisfrederic.bertrand@lecnam.net
2025-11-18
Source:vignettes/kpls_review.Rmd
kpls_review.RmdNotation
Let and . We assume column-centered data unless stated otherwise. PLS extracts latent scores with loadings and weights so that covariance between and along is maximized, with orthogonality constraints across components.
For kernel methods, let be an implicit feature map and define the Gram matrix where . The centering operator yields a centered Gram .
Pseudo-code for bigPLSR algorithms
The package implements several complementary extraction schemes. The following pseudo-code summarises the core loops.
SIMPLS (dense/bigmem)
- Compute centered cross-products , .
- Initialise orthonormal basis .
- For each component
:
- Deflate in the subspace spanned by .
- Extract as the dominant eigenvector of .
- Compute and normalise under the -metric.
- Obtain loadings and regression weights .
- Expand .
- Form , , and compute regression coefficients .
NIPALS (dense/streamed)
- Initialise from (or ).
- Iterate until convergence:
- , normalise .
- .
- .
- (for multi-response).
- Deflate , and repeat for the next component.
Kernel PLS / RKHS (dense & streamed)
- Form (or stream) the centered Gram matrix .
- At each iteration extract a dual weight maximising covariance with .
- Obtain the score , regress on to get and deflate in the metric.
- Accumulate , and the orthonormal basis for subsequent deflation steps.
Double RKHS ( algorithm = "rkhs_xy" )
- Build (or approximate) Gram matrices for and .
- Extract dual directions and so that the score pair maximises covariance under both kernels.
- Use ridge-regularised projections to obtain regression weights.
- Store kernel centering statistics for prediction.
KLPLS / Kernel PLS (Dayal & MacGregor)
We operate in the dual. Consider and . At step , we extract a dual direction so that the score maximizes covariance with , subject to orthogonality in the RKHS metric:
A SIMPLS-style recursion in the dual:
- Compute the cross-covariance operator .
- Extract a direction in via dominant eigenvector of or by power iterations.
- Set , normalize .
- Regress on : .
- Deflate and orthogonalize subsequent directions in the -metric.
Prediction uses the dual coefficients; for a new , and . When is multivariate, apply steps component-wise with a shared .
In bigPLSR
- Dense path: algorithm="rkhs" builds
(or an approximation) and runs dual SIMPLS deflation.
- Big-matrix path: block-streamed Gram computations to avoid
materializing
.
Streaming Gram blocks (column- and row-chunked)
We avoid forming explicitly by accumulating blocks. Write for blocks taken by rows (row-chunked/XXᵗ) or with column chunks via where are column submatrices (useful for tall-skinny ).
Row-chunked (XXᵗ): 1. For blocks : compute . 2. Accumulate on the fly when needed in matrix-vector products without storing full .
Column-chunked: 1. Partition the feature dimension into blocks . 2. For each block : . 3. Use to update accumulators and to refresh deflation quantities ().
Memory
- Row-chunked:
.
- Column-chunked:
.
Pick based on layout and cache friendliness.
Kernel approximations: Nyström and Random Fourier Features
Nyström (rank
)
Sample a subset
of size
,
form
and
.
Define the sketch
,
so
.
Center
by subtracting row/column means. Run linear PLS on
.
RFF (RBF kernels)
Draw
and
.
Define features
,
so
.
Run linear PLS on
.
Kernel Logistic PLS (binary classification)
We first compute KPLS scores from vs labels , then run logistic regression in latent space via IRLS:
Minimize with . IRLS step at iteration :
where and . Optionally alternate: recompute KPLS scores with current residuals and re-run a few IRLS steps. Class weights can be injected by scaling rows of .
In bigPLSRalgorithm="klogitpls" computes
(dense or streamed) then fits IRLS in latent space.
Sparse Kernel PLS (sketch)
Promote sparsity in dual or primal weights. In dual form, constrain by (or group) penalty:
A practical approach uses proximal gradient or coordinate descent on a smooth surrogate of covariance, with periodic orthogonalization of the resulting score vectors in the metric. Early stop by explained covariance. (The current release provides the scaffolding API.)
PLS in RKHS for X and Y (double RKHS)
Let and be centered Grams for and (with small ridge for stability). The cross-covariance operator is . Dual SIMPLS extracts latent directions via the dominant eigenspace of with orthogonalization under the inner product.
Prediction returns dual coefficients for and for .
In bigPLSRalgorithm="rkhs_xy" wires this in dense mode; a streamed
variant can be built by block Gram accumulations on
and
.
Kalman-Filter PLS (KF-PLS; streaming)
KF-PLS maintains a state that tracks latent parameters over incoming mini-batches. Let the state be for the current component, with state transition (random walk) and “measurement” formed from the current block cross-covariances (). The Kalman update:
After convergence (or patience stop), form , normalize, and proceed to next component with deflation compatible with SIMPLS/NIPALS choice.
In bigPLSRalgorithm="kf_pls" reuses the existing chunked T
streaming kernel and updates the state per block.
API quick start
library(bigPLSR)
# Dense RKHS PLS with Nyström of rank 500 (rbf kernel)
fit_rkhs <- pls_fit(X, Y, ncomp = 5,
backend = "arma",
algorithm = "rkhs",
kernel = "rbf", gamma = 0.5,
approx = "nystrom", approx_rank = 500,
scores = "r")
# Bigmemory, kernel logistic PLS (streamed scores + IRLS)
fit_klog <- pls_fit(bmX, bmy, ncomp = 4,
backend = "bigmem",
algorithm = "klogitpls",
kernel = "rbf", gamma = 1.0,
chunk_size = 16384L,
scores = "r")
# Sparse KPLS (dense scaffold)
fit_sk <- pls_fit(X, Y, ncomp = 5,
backend = "arma",
algorithm = "sparse_kpls")Prediction in RKHS PLS
Let be training inputs and the responses. With kernel and training Gram , the centered Gram is KPLS on yields dual coefficients .
For new inputs $X_\*$, the cross-kernel $K_\* \in \mathbb{R}^{n_\*\times n}$ has $(K_\*)_{i,j} = k(x^\*_i, x_j)$. The centered cross-Gram is $$ K_{\*,c} = K_\* - \mathbf{1}_{n_\*}\bar{k}^\top - \bar{k}_\*\mathbf{1}_n^\top + \bar{\bar{k}}, \quad \bar{k}_\* = \frac{1}{n}K_\*\mathbf{1}_n. $$ Predictions follow $$ \widehat{Y}_\* = K_{\*,c}\,A + \mathbf{1}_{n_\*}\,\mu_Y^\top, $$ where is the vector of training response means.
In bigPLSR, these are stored as: dual_coef
(),
k_colmeans
(),
k_mean
(),
y_means
(),
and X_ref (dense training inputs). The RKHS branch of
predict.big_plsr() uses the same formula.
Dependency overview (wrappers → C++ entry points)
pls_fit(algorithm="simpls", backend="arma")
└─ .Call("_bigPLSR_cpp_simpls_from_cross")
pls_fit(algorithm="simpls", backend="bigmem")
├─ .Call("_bigPLSR_cpp_bigmem_cross")
├─ .Call("_bigPLSR_cpp_simpls_from_cross")
└─ .Call("_bigPLSR_cpp_stream_scores_given_W")
pls_fit(algorithm="nipals", backend="arma")
└─ cpp_dense_plsr_nipals()
pls_fit(algorithm="nipals", backend="bigmem")
└─ big_plsr_stream_fit_nipals()
pls_fit(algorithm="kernelpls"/"widekernelpls")
└─ .kernel_pls_core() (R)
pls_fit(algorithm="rkhs", backend="arma")
└─ .Call("_bigPLSR_cpp_kpls_rkhs_dense")
pls_fit(algorithm="rkhs", backend="bigmem")
└─ .Call("_bigPLSR_cpp_kpls_rkhs_bigmem")
pls_fit(algorithm="klogitpls", backend="arma")
└─ .Call("_bigPLSR_cpp_klogit_pls_dense")
pls_fit(algorithm="klogitpls", backend="bigmem")
└─ .Call("_bigPLSR_cpp_klogit_pls_bigmem")
pls_fit(algorithm="sparse_kpls")
└─ .Call("_bigPLSR_cpp_sparse_kpls_dense")
pls_fit(algorithm="rkhs_xy")
└─ .Call("_bigPLSR_cpp_rkhs_xy_dense")
pls_fit(algorithm="kf_pls")
└─ .Call("_bigPLSR_cpp_kf_pls_stream")
References
- Dayal, B., & MacGregor, J.F. (1997). Improved PLS algorithms. Journal of Chemometrics, 11(1), 73–85, doi:10.1002/(SICI)1099-128X(199701)11:1%3C73::AID-CEM446%3E3.0.CO;2-2.
- Rosipal, R., & Trejo, L.J. (2001). Kernel PLS regression in RKHS. Journal of Machine Learning Research, 2, 97–123, doi:10.1162/153244302760200687.
- Tenenhaus et al., Kernel Logistic PLS.
- Sparse Kernel Partial Least Squares Regression. In LNCS Proceedings.
- Rosipal et al., RKHS PLS (JMLR) http://www.jmlr.org/papers/v2/rosipal01a.html.
- Kernel PLS Regression II (double RKHS). IEEE Transactions on Neural Networks and Learning Systems, doi:10.1109/TNNLS.2019.2932014.
- KF-PLS (2024) doi:10.1016/j.chemolab.2024.104024