Automatic Algorithm Selection in bigPLSR
Frédéric Bertrand
Cedric, Cnam, Parisfrederic.bertrand@lecnam.net
2025-11-18
Source:vignettes/bigPLSR-auto-selection.Rmd
bigPLSR-auto-selection.RmdWe choose between XtX (SIMPLS), XX^T (wide-kernel), and NIPALS using and a RAM budget.
# In pls_fit(), after arg parsing:
if (identical(algo_in, "auto")) {
algo_in <- .choose_algorithm_auto(backend, X, y, ncomp)
}
.mem_bytes <- function() {
gb <- getOption("bigPLSR.mem_budget_gb", 8)
as.numeric(gb) * (1024^3)
}
.dims_of <- function(X) {
if (inherits(X, "big.matrix")) c(nrow(X), ncol(X)) else c(NROW(X), NCOL(X))
}
.choose_algorithm_auto <- function(backend, X, y, ncomp) {
is_big_local <- inherits(X, "big.matrix") || inherits(X, "big.matrix.descriptor")
dims <- .dims_of(X); n <- as.integer(dims[1]); p <- as.integer(dims[2])
B <- .mem_bytes()
bytes <- 8
need_XtX <- bytes * as.double(p) * as.double(p) # bytes for p x p
need_XXt <- bytes * as.double(n) * as.double(n) # bytes for n x n
can_XtX <- need_XtX <= M
can_XXt <- need_XXt <= M
shape_XtX <- (p <= 4L * n)
shape_XXt <- (n <= 4L * p)
if (can_XtX && shape_XtX) {
algo_in <- "simpls"
} else if (can_XXt && shape_XXt) {
algo_in <- "widekernelpls"
} else {
algo_in <- "nipals"
}
}Users can override the memory budget:
options(bigPLSR.memory_budget_bytes = 8L * 1024^3) # 8 GiBWhen does each win?
- XtX (SIMPLS): moderate (fits in RAM). Fast BLAS-3; excellent for .
- XXᵗ (wide-kernel): moderate (fits ). Great when (wide problems).
- NIPALS / streaming: both and exceed budget; supports file-backed scores and large-scale chunked BLAS.
Sanity check
set.seed(1)
n <- 1e5; p <- 200
X <- matrix(rnorm(n*p), n, p)
y <- X[,1]*0.5 + rnorm(n)
bmX <- bigmemory::as.big.matrix(X)
bmy <- bigmemory::as.big.matrix(matrix(y, n, 1))
options(bigPLSR.memory_budget_bytes = 2L*1024^3)
fit <- pls_fit(bmX, bmy, ncomp=3, backend="bigmem", scores="r")
fit$algorithmOverview
bigPLSR::pls_fit() can automatically choose an algorithm
based on problem shape and a user-configurable
memory budget:
-
SIMPLS (XtX route) when forming the
p × pcross-product fits in memory. -
SIMPLS (XXt / kernel route) when
XtXdoes not fit butXXt (n × n)does. -
NIPALS (streaming) when neither
XtXnorXXtcomfortably fit.
This selection only applies when algorithm = "auto" (the
default). Any explicit algorithm = overrides the
decision.
Why these choices?
- SIMPLS works entirely from centered cross-products, which is fast
and numerically robust when the target cross-product fits (either
p×porn×n). - Using
XtXis efficient whenpis moderate; usingXXtis efficient for “wide” problems (p ≫ n) but still bounded byn^2memory. - NIPALS avoids materializing any large cross-product and can
stream from
big.memorywith fixed working memory; it is the safe fallback when memory is tight.
The decision rule
Let the memory budget be B bytes (defaults to 8 GB,
configurable via options(bigPLSR.mem_budget_gb = ...)).
With doubles (8 bytes), we estimate the size of each symmetric matrix
as:
need_XtX = 8 * p^2need_XXt = 8 * n^2
Then:
if (can_XtX && shape_XtX) { algo_in <- "simpls"}. # XtX
if (can_XXt && shape_XXt) { algo_in <- "widekernelpls"}. XXt (a.k.a. "kernel" route)
else { algorithm <- "nipals"} # streaming
Configuring the memory budget
# Use 16 GB as selection budget
options(bigPLSR.mem_budget_gb = 16)This does not change R’s actual memory limit; it only controls the selection.
Reproducibility knobs
For tight numerical parity in tests:
set.seed(1)
if (requireNamespace("RhpcBLASctl", quietly = TRUE)) {
RhpcBLASctl::blas_set_num_threads(1L)
RhpcBLASctl::omp_set_num_threads(1L)
}
# otherwise, you can try environment variables:
# Sys.setenv(OPENBLAS_NUM_THREADS = "1", OMP_NUM_THREADS = "1")Examples
library(bigPLSR)
n <- 2e3; p <- 5e2
X <- matrix(rnorm(n*p), n, p)
y <- X[,1] - 0.5*X[,2] + rnorm(n)
# Auto will likely pick SIMPLS (XtX) here
fit <- pls_fit(X, y, ncomp = 10, algorithm = "auto")
fit$algorithm # "simpls"Wide case:
n <- 200; p <- 4000
X <- matrix(rnorm(n*p), n, p)
y <- rnorm(n)
# If budget is small, auto picks kernel (XXt) or NIPALS
options(bigPLSR.mem_budget_gb = 2) # small budget
fit <- pls_fit(X, y, ncomp = 5, algorithm = "auto")
fit$algorithm # "kernelpls" or "nipals" depending on n^2 vs budgetBig-matrix streaming:
References
- de Jong, S. (1993). SIMPLS: An alternative approach to partial least squares regression. Chemometrics and Intelligent Laboratory Systems, 18(3), 251–263.
- Dayal, B., & MacGregor, J. F. (1997). Improved PLS algorithms. Journal of Chemometrics, 11(1), 73–85.
- Rosipal, R., & Trejo, L. J. (2001). Kernel Partial Least Squares Regression in Reproducing Kernel Hilbert Space. Journal of Machine Learning Research, 2, 97–123.
- Wold, H. (1966, 1985). NIPALS algorithm (original PLS formulation).