Skip to contents

Motivation

SelectBoost.beta re-uses the correlated-resampling machinery introduced by the original SelectBoost package and combines it with Beta-regression selectors. This vignette summarises the main routines and presents pseudo-code for their internal logic. The goal is to make it easy to re-implement or extend the algorithms in other contexts.

Building blocks

The following helpers expose the canonical SelectBoost stages.

  • sb_normalize() centres and 2\ell_2-normalises the design matrix columns.
  • sb_compute_corr() computes a correlation (or user-supplied association) matrix from the normalised design.
  • sb_group_variables() converts the correlation matrix into groups of highly associated predictors for a given threshold c0c_0.
  • sb_resample_groups() regenerates correlated predictors for each group by drawing from a multivariate normal approximation and re-normalising. When all groups are singletons it now warns and simply returns repeated copies of the normalised design.
  • sb_apply_selector_manual() applies a selector to each resampled design and collects the resulting coefficient vectors. Set keep_template = TRUE (the default) to retain the base fit as column sim0 without recomputing it on the first resample.
  • sb_selection_frequency() converts the matrix of coefficients into selection frequencies while respecting the selector’s coefficient convention.

Pseudo-code: manual workflow

The manual SelectBoost workflow follows the same steps regardless of the base selector. Pseudo-code for producing selection frequencies at a single threshold is given below.

Procedure ManualSelectBoost(X, Y, selector, c0, B):
  1. X_norm <- sb_normalize(X)
  2. Corr <- sb_compute_corr(X_norm)
  3. Groups <- sb_group_variables(Corr, c0)
  4. Resamples <- sb_resample_groups(X_norm, Groups, B)
  5. CoefMatrix <- sb_apply_selector_manual(X_norm, Resamples, Y, selector)
  6. Frequencies <- sb_selection_frequency(CoefMatrix, version = "glmnet")
  7. Return Frequencies

In practice sb_resample_groups() preserves singletons untouched. Only groups with two or more predictors receive correlated draws.

Pseudo-code: correlation grid driver

sb_beta() extends the manual workflow by iterating over a grid of correlation thresholds. The following pseudo-code matches the behaviour of the exported function.

Algorithm sb_beta(X, Y, selector, B, step.num, steps.seq, version, squeeze):
  1. If squeeze, transform Y into the open unit interval.
  2. X_norm <- sb_normalize(X)
  3. Corr <- sb_compute_corr(X_norm)
  4. Grid <- {1} ∪ .sb_c0_sequence(Corr, step.num, steps.seq) ∪ {0}
  5. For each c0 in Grid:
       a. Groups <- sb_group_variables(Corr, c0)
       b. If every group has size 1:
            i. CoefMatrix <- selector(X_norm, Y)
          Else:
            i. Resamples <- sb_resample_groups(X_norm, Groups, B)
           ii. For each design in Resamples:
                  - CoefMatrix[, b] <- selector(design, Y)
       c. Freq[c0, ] <- sb_selection_frequency(CoefMatrix, version)
  6. Attach attributes (B, selector, c0 sequence) and return Freq

The selector argument can be any function returning a numeric vector of coefficients with optional names. When version = "glmnet", the first entry is interpreted as the intercept and excluded from the selection frequencies.

The squeezing step enforces the usual SelectBoost transformation that pushes all responses inside (0, 1). Keep it enabled unless you already pre-processed the outcome; otherwise zero or one values will cause the selectors to abort.

Extending the algorithms

The modular helpers are designed to be recomposed. For example, it is possible to plug in a custom grouping routine before calling sb_resample_groups() or to supply a selector that implements cross-validation or penalisation strategies. Because each helper only relies on basic R primitives, the pseudo-code above translates readily into other languages.

Conference communications

The SelectBoost4Beta concepts described here were showcased by Frédéric Bertrand and Myriam Maumy in 2023 at:

  • Joint Statistical Meetings 2023 (Toronto, Canada): “Improving variable selection in Beta regression models using correlated resampling”.
  • BioC2023 (Boston, USA): “SelectBoost4Beta: Improving variable selection in Beta regression models”.

These communications detailed how correlation-aware resampling strengthens variable selection performance for Beta regression under strong predictor dependencies.