SelectBoost.beta algorithms
Frédéric Bertrand
Cedric, Cnam, Parisfrederic.bertrand@lecnam.net
2025-10-30
Source:vignettes/selectboost-algorithms.Rmd
selectboost-algorithms.RmdMotivation
SelectBoost.beta re-uses the correlated-resampling
machinery introduced by the original SelectBoost package and combines it
with Beta-regression selectors. This vignette summarises the main
routines and presents pseudo-code for their internal logic. The goal is
to make it easy to re-implement or extend the algorithms in other
contexts.
Building blocks
The following helpers expose the canonical SelectBoost stages.
-
sb_normalize()centres and -normalises the design matrix columns. -
sb_compute_corr()computes a correlation (or user-supplied association) matrix from the normalised design. -
sb_group_variables()converts the correlation matrix into groups of highly associated predictors for a given threshold . -
sb_resample_groups()regenerates correlated predictors for each group by drawing from a multivariate normal approximation and re-normalising. When all groups are singletons it now warns and simply returns repeated copies of the normalised design. -
sb_apply_selector_manual()applies a selector to each resampled design and collects the resulting coefficient vectors. Setkeep_template = TRUE(the default) to retain the base fit as columnsim0without recomputing it on the first resample. -
sb_selection_frequency()converts the matrix of coefficients into selection frequencies while respecting the selector’s coefficient convention.
Pseudo-code: manual workflow
The manual SelectBoost workflow follows the same steps regardless of the base selector. Pseudo-code for producing selection frequencies at a single threshold is given below.
Procedure ManualSelectBoost(X, Y, selector, c0, B):
1. X_norm <- sb_normalize(X)
2. Corr <- sb_compute_corr(X_norm)
3. Groups <- sb_group_variables(Corr, c0)
4. Resamples <- sb_resample_groups(X_norm, Groups, B)
5. CoefMatrix <- sb_apply_selector_manual(X_norm, Resamples, Y, selector)
6. Frequencies <- sb_selection_frequency(CoefMatrix, version = "glmnet")
7. Return Frequencies
In practice sb_resample_groups() preserves singletons
untouched. Only groups with two or more predictors receive correlated
draws.
Pseudo-code: correlation grid driver
sb_beta() extends the manual workflow by iterating over
a grid of correlation thresholds. The following pseudo-code matches the
behaviour of the exported function.
Algorithm sb_beta(X, Y, selector, B, step.num, steps.seq, version, squeeze):
1. If squeeze, transform Y into the open unit interval.
2. X_norm <- sb_normalize(X)
3. Corr <- sb_compute_corr(X_norm)
4. Grid <- {1} ∪ .sb_c0_sequence(Corr, step.num, steps.seq) ∪ {0}
5. For each c0 in Grid:
a. Groups <- sb_group_variables(Corr, c0)
b. If every group has size 1:
i. CoefMatrix <- selector(X_norm, Y)
Else:
i. Resamples <- sb_resample_groups(X_norm, Groups, B)
ii. For each design in Resamples:
- CoefMatrix[, b] <- selector(design, Y)
c. Freq[c0, ] <- sb_selection_frequency(CoefMatrix, version)
6. Attach attributes (B, selector, c0 sequence) and return Freq
The selector argument can be any function returning a numeric vector
of coefficients with optional names. When
version = "glmnet", the first entry is interpreted as the
intercept and excluded from the selection frequencies.
The squeezing step enforces the usual SelectBoost transformation that
pushes all responses inside (0, 1). Keep it enabled unless
you already pre-processed the outcome; otherwise zero or one values will
cause the selectors to abort.
Extending the algorithms
The modular helpers are designed to be recomposed. For example, it is
possible to plug in a custom grouping routine before calling
sb_resample_groups() or to supply a selector that
implements cross-validation or penalisation strategies. Because each
helper only relies on basic R primitives, the pseudo-code above
translates readily into other languages.
Conference communications
The SelectBoost4Beta concepts described here were showcased by Frédéric Bertrand and Myriam Maumy in 2023 at:
- Joint Statistical Meetings 2023 (Toronto, Canada): “Improving variable selection in Beta regression models using correlated resampling”.
- BioC2023 (Boston, USA): “SelectBoost4Beta: Improving variable selection in Beta regression models”.
These communications detailed how correlation-aware resampling strengthens variable selection performance for Beta regression under strong predictor dependencies.