Skip to contents

Notation

Let Xn×pX \in \mathbb{R}^{n\times p} and Yn×mY \in \mathbb{R}^{n\times m}. We assume column-centered data unless stated otherwise. PLS extracts latent scores T=[t1,,ta]T=[t_1,\dots,t_a] with loadings and weights so that covariance between XX and YY along tat_a is maximized, with orthogonality constraints across components.

For kernel methods, let ϕ()\phi(\cdot) be an implicit feature map and define the Gram matrix KX=ΦXΦXK_X = \Phi_X \Phi_X^\top where (KX)ij=k(xi,xj)(K_X)_{ij} = k(x_i, x_j). The centering operator H=In1n𝟏𝟏H = I_n - \frac{1}{n}\mathbf{1}\mathbf{1}^\top yields a centered Gram K̃X=HKXH\tilde K_X = H K_X H.

Pseudo-code for bigPLSR algorithms

The package implements several complementary extraction schemes. The following pseudo-code summarises the core loops.

SIMPLS (dense/bigmem)

  1. Compute centered cross-products Cxx=XXC_{xx} = X^\top X, Cxy=XYC_{xy} = X^\top Y.
  2. Initialise orthonormal basis V=[]V = [].
  3. For each component a=1..Aa = 1..A:
    • Deflate CxyC_{xy} in the subspace spanned by VV.
    • Extract qaq_a as the dominant eigenvector of CxyCxyC_{xy}^\top C_{xy}.
    • Compute wa=Cxyqaw_a = C_{xy} q_a and normalise under the CxxC_{xx}-metric.
    • Obtain loadings pa=Cxxwap_a = C_{xx} w_a and regression weights ca=Cxywac_a = C_{xy}^\top w_a.
    • Expand V[V,pa]V \leftarrow [V, p_a].
  4. Form W=[wa]W = [w_a], P=[pa]P = [p_a], Q=[ca]Q = [c_a] and compute regression coefficients B=W(PW)1QB = W (P^\top W)^{-1} Q^\top.

NIPALS (dense/streamed)

  1. Initialise tat_a from YY (or XX).
  2. Iterate until convergence:
    • wa=Xta/(tata)w_a = X^\top t_a / (t_a^\top t_a), normalise waw_a.
    • ta=Xwat_a = X w_a.
    • ca=Yta/(tata)c_a = Y^\top t_a / (t_a^\top t_a).
    • ua=Ycau_a = Y c_a (for multi-response).
  3. Deflate XXtapaX \leftarrow X - t_a p_a^\top, YYtaqaY \leftarrow Y - t_a q_a^\top and repeat for the next component.

Kernel PLS / RKHS (dense & streamed)

  1. Form (or stream) the centered Gram matrix K̃X\tilde K_X.
  2. At each iteration extract a dual weight αa\alpha_a maximising covariance with YY.
  3. Obtain the score ta=K̃Xαat_a = \tilde K_X \alpha_a, regress YY on tat_a to get qaq_a and deflate in the K̃X\tilde K_X metric.
  4. Accumulate αa\alpha_a, qaq_a and the orthonormal basis for subsequent deflation steps.

Double RKHS ( algorithm = "rkhs_xy" )

  1. Build (or approximate) Gram matrices for XX and YY.
  2. Extract dual directions αa\alpha_a and βa\beta_a so that the score pair (ta,ua)(t_a, u_a) maximises covariance under both kernels.
  3. Use ridge-regularised projections to obtain regression weights.
  4. Store kernel centering statistics for prediction.

Kalman-filter PLS (algorithm = "kf_pls")

  1. Maintain exponentially weighted means μx,μy\mu_x, \mu_y.
  2. Update cross-products Cxx,CxyC_{xx}, C_{xy} with forgetting factor λ\lambda and optional process noise.
  3. Periodically call SIMPLS on the smoothed moments to recover regression coefficients consistent with the streamed state.

Common kernels:

Linear:k(x,z)=xzRBF:k(x,z)=exp(γxz2)Polynomial:k(x,z)=(γxz+c0)dSigmoid:k(x,z)=tanh(γxz+c0). \begin{aligned} \text{Linear:}\quad& k(x,z) = x^\top z \\ \text{RBF:}\quad& k(x,z) = \exp(-\gamma \|x-z\|^2) \\ \text{Polynomial:}\quad& k(x,z) = (\gamma\,x^\top z + c_0)^{d} \\ \text{Sigmoid:}\quad& k(x,z) = \tanh(\gamma\,x^\top z + c_0). \end{aligned}

Centering the Gram matrix

Given Kn×nK\in\mathbb{R}^{n\times n}, the centered version is:

K̃=HKH,H=In1n𝟏𝟏. \tilde K = H K H, \quad H = I_n - \tfrac{1}{n}\mathbf{1}\mathbf{1}^\top.


KLPLS / Kernel PLS (Dayal & MacGregor)

We operate in the dual. Consider KXK_X and KXY=KXYK_{XY} = K_X Y. At step aa, we extract a dual direction αa\alpha_a so that the score ta=K̃Xαat_a = \tilde K_X \alpha_a maximizes covariance with YY, subject to orthogonality in the RKHS metric:

maxαcov(t,Y)s.t.t=K̃Xα,tt=1,ttb=0,b<a. \max_{\alpha} \ \mathrm{cov}(t, Y) \quad \text{s.t.}\ \ t=\tilde K_X \alpha,\ \ t^\top t = 1,\ \ t^\top t_b = 0,\ b<a.

A SIMPLS-style recursion in the dual:

  1. Compute the cross-covariance operator C=K̃XYC = \tilde K_X Y.
  2. Extract a direction in n\mathbb{R}^n via dominant eigenvector of CCC C^\top or by power iterations.
  3. Set ta=K̃Xαat_a = \tilde K_X \alpha_a, normalize tat_a.
  4. Regress YY on tat_a: qa=(tata)1taYq_a = (t_a^\top t_a)^{-1} t_a^\top Y.
  5. Deflate YYtaqaY \leftarrow Y - t_a q_a^\top and orthogonalize subsequent directions in the K̃X\tilde K_X-metric.

Prediction uses the dual coefficients; for a new xx_\star, k=[k(x,xi)]i=1nk_\star = [k(x_\star, x_i)]_{i=1}^n and t=k̃αt_\star = \tilde k_\star^\top \alpha. When YY is multivariate, apply steps component-wise with a shared tat_a.

In bigPLSR
- Dense path: algorithm="rkhs" builds K̃X\tilde K_X (or an approximation) and runs dual SIMPLS deflation.
- Big-matrix path: block-streamed Gram computations to avoid materializing n×nn\times n.


Streaming Gram blocks (column- and row-chunked)

We avoid forming K̃X\tilde K_X explicitly by accumulating blocks. Write KX=BXBXK_X = \sum_{B} X_B X^\top for blocks XBX_B taken by rows (row-chunked/XXᵗ) or KX=XX=CXCK_X = X X^\top = \sum_{C} X C^\top with column chunks via XCX C^\top where CC are column submatrices (useful for tall-skinny XX).

Row-chunked (XXᵗ): 1. For blocks B{1,,n}B \subset \{1,\ldots,n\}: compute GB=XBXG_B = X_B X^\top. 2. Accumulate KK+HGBHK \leftarrow K + H G_B H on the fly when needed in matrix-vector products (Kv)(K v) without storing full KK.

Column-chunked: 1. Partition the feature dimension pp into blocks JJ. 2. For each block JJ: GJ=X[:,J]X[:,J]G_J = X_{[:,J]} X_{[:,J]}^\top. 3. Use GJG_J to update KvK v accumulators and to refresh deflation quantities (t,qt, q).

Memory
- Row-chunked: O(nchunk_rows)O(n \cdot \text{chunk\_rows}).
- Column-chunked: O(nchunk_cols)O(n \cdot \text{chunk\_cols}).
Pick based on layout and cache friendliness.


Kernel approximations: Nyström and Random Fourier Features

Nyström (rank rr)
Sample a subset SS of size rr, form KSSK_{SS} and KNSK_{NS}. Define the sketch Z=KNSKSS1/2Z = K_{NS} K_{SS}^{-1/2}, so KZZK \approx Z Z^\top. Center ZZ by subtracting row/column means. Run linear PLS on ZZ.

RFF (RBF kernels)
Draw {ω}=1r𝒩(0,2γI)\{\omega_\ell\}_{\ell=1}^r \sim \mathcal{N}(0,2\gamma I) and b𝒰[0,2π]b_\ell\sim \mathcal{U}[0,2\pi]. Define features φ(x)=2rcos(ωx+b)\varphi_\ell(x)=\sqrt{\tfrac{2}{r}}\cos(\omega_\ell^\top x + b_\ell), so k(x,z)φ(x)φ(z)k(x,z)\approx \varphi(x)^\top \varphi(z). Run linear PLS on φ(X)\varphi(X).


Kernel Logistic PLS (binary classification)

We first compute KPLS scores Tn×aT\in\mathbb{R}^{n\times a} from XX vs labels y{0,1}y\in\{0,1\}, then run logistic regression in latent space via IRLS:

Minimize (β,β0)=i{yiηilog(1+expηi)}\ell(\beta, \beta_0) = -\sum_i \{ y_i\eta_i - \log(1+\exp\eta_i)\} with η=β0𝟏+Tβ\eta = \beta_0\mathbf{1} + T \beta. IRLS step at iteration kk:

W=diag(p(k)(1p(k))),z=η(k)+W1(yp(k)),[β0β]=(XηWXη+λI)1XηWz, W = \mathrm{diag}(p^{(k)}(1-p^{(k)})),\quad z = \eta^{(k)} + W^{-1}(y - p^{(k)}),\quad \begin{bmatrix}\beta_0\\ \beta\end{bmatrix} = (X_\eta^\top W X_\eta + \lambda I)^{-1} X_\eta^\top W z,

where Xη=[𝟏,T]X_\eta = [\mathbf{1}, T] and p(k)=σ(η(k))p^{(k)} = \sigma(\eta^{(k)}). Optionally alternate: recompute KPLS scores with current residuals and re-run a few IRLS steps. Class weights wcw_c can be injected by scaling rows of WW.

In bigPLSR
algorithm="klogitpls" computes TT (dense or streamed) then fits IRLS in latent space.


Sparse Kernel PLS (sketch)

Promote sparsity in dual or primal weights. In dual form, constrain αa\alpha_a by 1\ell_1 (or group) penalty:

maxαcov(K̃α,Y)λα1s.t.(K̃α)(K̃α)=1,tatb=0(b<a). \max_{\alpha}\ \mathrm{cov}(\tilde K \alpha, Y) - \lambda\|\alpha\|_1 \quad\text{s.t.}\quad (\tilde K\alpha)^\top (\tilde K\alpha) = 1,\ t_a^\top t_b=0\ (b<a).

A practical approach uses proximal gradient or coordinate descent on a smooth surrogate of covariance, with periodic orthogonalization of the resulting score vectors in the K̃\tilde K metric. Early stop by explained covariance. (The current release provides the scaffolding API.)


PLS in RKHS for X and Y (double RKHS)

Let KXK_X and KYK_Y be centered Grams for XX and YY (with small ridge λX,λY\lambda_X,\lambda_Y for stability). The cross-covariance operator is A=KX(KY+λYI)KXA = K_X (K_Y + \lambda_Y I) K_X. Dual SIMPLS extracts latent directions via the dominant eigenspace of AA with orthogonalization under the KXK_X inner product.

Prediction returns dual coefficients α\alpha for XX and β\beta for YY.

In bigPLSR
algorithm="rkhs_xy" wires this in dense mode; a streamed variant can be built by block Gram accumulations on KXK_X and KYK_Y.


Kalman-Filter PLS (KF-PLS; streaming)

KF-PLS maintains a state that tracks latent parameters over incoming mini-batches. Let the state be s={w,p,q,b}s = \{w, p, q, b\} for the current component, with state transition sk+1=sk+ϵks_{k+1} = s_k + \epsilon_k (random walk) and “measurement” formed from the current block cross-covariances (Xt̂,Yt̂\widehat{X^\top t}, \widehat{Y^\top t}). The Kalman update:

Predict: ŝk|k1=ŝk1,Pk|k1=Pk1+QInnovation: νk=zkHkŝk|k1,Sk=HkPk|k1Hk+RGain: Kk=Pk|k1HkSk1Update: ŝk=ŝk|k1+Kkνk,Pk=(IKkHk)Pk|k1. \begin{aligned} &\text{Predict: } \hat s_{k|k-1} = \hat s_{k-1},\ \ P_{k|k-1}=P_{k-1}+Q \\ &\text{Innovation: } \nu_k = z_k - H_k \hat s_{k|k-1},\ \ S_k = H_k P_{k|k-1} H_k^\top + R \\ &\text{Gain: } K_k = P_{k|k-1} H_k^\top S_k^{-1} \\ &\text{Update: } \hat s_k = \hat s_{k|k-1} + K_k \nu_k,\ \ P_k = (I - K_k H_k) P_{k|k-1}. \end{aligned}

After convergence (or patience stop), form t=Xwt = X w, normalize, and proceed to next component with deflation compatible with SIMPLS/NIPALS choice.

In bigPLSR
algorithm="kf_pls" reuses the existing chunked T streaming kernel and updates the state per block.


API quick start

library(bigPLSR)

# Dense RKHS PLS with Nyström of rank 500 (rbf kernel)
fit_rkhs <- pls_fit(X, Y, ncomp = 5,
                    backend   = "arma",
                    algorithm = "rkhs",
                    kernel = "rbf", gamma = 0.5,
                    approx = "nystrom", approx_rank = 500,
                    scores = "r")

# Bigmemory, kernel logistic PLS (streamed scores + IRLS)
fit_klog <- pls_fit(bmX, bmy, ncomp = 4,
                    backend   = "bigmem",
                    algorithm = "klogitpls",
                    kernel = "rbf", gamma = 1.0,
                    chunk_size = 16384L,
                    scores = "r")

# Sparse KPLS (dense scaffold)
fit_sk <- pls_fit(X, Y, ncomp = 5,
                  backend = "arma",
                  algorithm = "sparse_kpls")

Prediction in RKHS PLS

Let Xn×pX\in\mathbb{R}^{n\times p} be training inputs and Yn×mY\in\mathbb{R}^{n\times m} the responses. With kernel k(,)k(\cdot,\cdot) and training Gram KK, the centered Gram is Kc=K𝟏nkk𝟏n+k,k=1ni=1nKi,k=1n2i,jKij. K_c = K - \mathbf{1}_n \bar{k}^\top - \bar{k}\,\mathbf{1}_n^\top + \bar{\bar{k}}, \quad \bar{k} = \frac{1}{n}\sum_{i=1}^n K_{i\cdot},\quad \bar{\bar{k}} = \frac{1}{n^2}\sum_{i,j}K_{ij}. KPLS on KcK_c yields dual coefficients An×mA\in\mathbb{R}^{n\times m}.

For new inputs $X_\*$, the cross-kernel $K_\* \in \mathbb{R}^{n_\*\times n}$ has $(K_\*)_{i,j} = k(x^\*_i, x_j)$. The centered cross-Gram is $$ K_{\*,c} = K_\* - \mathbf{1}_{n_\*}\bar{k}^\top - \bar{k}_\*\mathbf{1}_n^\top + \bar{\bar{k}}, \quad \bar{k}_\* = \frac{1}{n}K_\*\mathbf{1}_n. $$ Predictions follow $$ \widehat{Y}_\* = K_{\*,c}\,A + \mathbf{1}_{n_\*}\,\mu_Y^\top, $$ where μY\mu_Y is the vector of training response means.

In bigPLSR, these are stored as: dual_coef (AA), k_colmeans (k\bar{k}), k_mean (k\bar{\bar{k}}), y_means (μY\mu_Y), and X_ref (dense training inputs). The RKHS branch of predict.big_plsr() uses the same formula.


Dependency overview (wrappers → C++ entry points)

pls_fit(algorithm="simpls", backend="arma")
  └─ .Call("_bigPLSR_cpp_simpls_from_cross")

pls_fit(algorithm="simpls", backend="bigmem")
  ├─ .Call("_bigPLSR_cpp_bigmem_cross")
  ├─ .Call("_bigPLSR_cpp_simpls_from_cross")
  └─ .Call("_bigPLSR_cpp_stream_scores_given_W")

pls_fit(algorithm="nipals", backend="arma")
  └─ cpp_dense_plsr_nipals()

pls_fit(algorithm="nipals", backend="bigmem")
  └─ big_plsr_stream_fit_nipals()

pls_fit(algorithm="kernelpls"/"widekernelpls")
  └─ .kernel_pls_core()  (R)

pls_fit(algorithm="rkhs", backend="arma")
  └─ .Call("_bigPLSR_cpp_kpls_rkhs_dense")

pls_fit(algorithm="rkhs", backend="bigmem")
  └─ .Call("_bigPLSR_cpp_kpls_rkhs_bigmem")

pls_fit(algorithm="klogitpls", backend="arma")
  └─ .Call("_bigPLSR_cpp_klogit_pls_dense")

pls_fit(algorithm="klogitpls", backend="bigmem")
  └─ .Call("_bigPLSR_cpp_klogit_pls_bigmem")

pls_fit(algorithm="sparse_kpls")
  └─ .Call("_bigPLSR_cpp_sparse_kpls_dense")

pls_fit(algorithm="rkhs_xy")
  └─ .Call("_bigPLSR_cpp_rkhs_xy_dense")

pls_fit(algorithm="kf_pls")
  └─ .Call("_bigPLSR_cpp_kf_pls_stream")

References