Skip to contents

Idea

We maintain exponentially-weighted cross-products 𝐂xx←λ𝐂xx+𝐗bβŠ€π—b+q𝐈,𝐂xy←λ𝐂xy+𝐗b⊀𝐘b, \mathbf{C}_{xx} \leftarrow \lambda\,\mathbf{C}_{xx} + \mathbf{X}_b^\top\mathbf{X}_b + q\,\mathbf{I},\qquad \mathbf{C}_{xy} \leftarrow \lambda\,\mathbf{C}_{xy} + \mathbf{X}_b^\top\mathbf{Y}_b, over mini-batches bb of rows, where 0<λ≀10<\lambda\le 1 is a forgetting factor and qβ‰₯0q\ge 0 a small process-noise ridge. At any time we extract latent components via SIMPLS on (𝐂xx,𝐂xy)(\mathbf{C}_{xx},\mathbf{C}_{xy}). This is stable, fast, and matches a Kalman-style tracking of slowly varying covariance structure.

API

fit <- pls_fit(X, Y, ncomp = 3,
               backend   = "arma"  # or "bigmem"
               ,algorithm = "kf_pls",
               scores    = "r",
               tol = 1e-8,
               # tuning:
               # options(bigPLSR.kf.lambda = 0.995,
               #         bigPLSR.kf.q_proc = 1e-6)
)

On bigmem, cross-products are streamed in row chunks; scores 𝐓\mathbf{T} are produced via the package’s chunked score kernel.

Notes

  • Ξ»β†’1\lambda\to 1 and qβ†’0q\to 0 recovers batch SIMPLS.
  • Smaller Ξ»\lambda emphasizes recent batches (concept drift).
  • qq stabilizes ill-conditioned 𝐂xx\mathbf{C}_{xx} on very high-dimensional data.