KF-PLS: Streaming PLS with Kalman-style updates

library(bigPLSR)
set.seed(1)

Idea

We maintain exponentially-weighted cross-products $\mathbf{C}_{xx} \leftarrow \lambda\,\mathbf{C}_{xx} + \mathbf{X}_b^\top\mathbf{X}_b + q\,\mathbf{I},\qquad \mathbf{C}_{xy} \leftarrow \lambda\,\mathbf{C}_{xy} + \mathbf{X}_b^\top\mathbf{Y}_b,$ over mini-batches $b$ of rows, where $0<\lambda\le 1$ is a forgetting factor and $q\ge 0$ a small process-noise ridge. At any time we extract latent components via SIMPLS on $(\mathbf{C}_{xx},\mathbf{C}_{xy})$ . This is stable, fast, and matches a Kalman-style tracking of slowly varying covariance structure.

API

fit <- pls_fit(X, Y, ncomp = 3,
               backend   = "arma"  # or "bigmem"
               ,algorithm = "kf_pls",
               scores    = "r",
               tol = 1e-8,
               # tuning:
               # options(bigPLSR.kf.lambda = 0.995,
               #         bigPLSR.kf.q_proc = 1e-6)
)

On bigmem, cross-products are streamed in row chunks; scores $\mathbf{T}$ are produced via the package’s chunked score kernel.

Notes

$\lambda\to 1$ and $q\to 0$ recovers batch SIMPLS.
Smaller $\lambda$ emphasizes recent batches (concept drift).
$q$ stabilizes ill-conditioned $\mathbf{C}_{xx}$ on very high-dimensional data.

Frédéric Bertrand

2025-11-18

Idea

API

Notes