SelectBoost-Style Variable Selection for Quantile Regression • SelectBoost.quantile

Frédéric Bertrand

SelectBoost.quantile adapts the SelectBoost idea to sparse quantile regression. The package builds correlation neighborhoods, perturbs correlated predictors with a directional sampler inspired by the original SelectBoost internals, refits penalized quantile regression models on the perturbed designs, and aggregates variable-selection frequencies across a path of correlation thresholds.

The current package already supports:

matrix and formula interfaces,
single- and multi-tau workflows,
tau-aware screening for p > n settings,
stronger penalty tuning through one-standard-error selection and lambda inflation,
complementary-pairs stability selection,
neighborhood capping for high-dimensional correlated designs,
hybrid support extraction that combines path stability and fitted effect size,
benchmark helpers for validation studies.

The package should still be read as a methodological prototype: the workflow is usable and documented, but the quantile-regression adaptation is still being validated and refined.

Start with vignette("getting-started", package = "SelectBoost.quantile") for the main workflow, then use vignette("validation-study", package = "SelectBoost.quantile") for the shipped benchmark comparisons.

Installation

You can install the development version of SelectBoost.quantile from GitHub with:

remotes::install_github("fbertran/SelectBoost.quantile")

From a local source checkout, use:

devtools::install()

Main workflow

For each correlation threshold c0, selectboost_quantile():

normalizes the design as in the original SelectBoost workflow,
builds correlation neighborhoods,
fits a sign-aligned directional model to each neighborhood,
perturbs the predictors in the sample hyperplane,
refits sparse quantile regression,
aggregates the selection frequencies across perturbations and stability subsamples.

The resulting frequency path can then be summarized, plotted, or thresholded into a stable support.

Example: matrix interface

library(SelectBoost.quantile)

sim <- simulate_quantile_data(
  n = 100,
  p = 20,
  active = 1:4,
  rho = 0.7,
  seed = 1
)

fit <- selectboost_quantile(
  sim$x,
  sim$y,
  tau = 0.5,
  B = 6,
  step_num = 0.5,
  screen = "auto",
  tune_lambda = "cv",
  lambda_rule = "one_se",
  lambda_inflation = 1.25,
  subsamples = 4,
  sample_fraction = 0.5,
  complementary_pairs = TRUE,
  max_group_size = 10,
  seed = 1,
  verbose = FALSE
)

print(fit)
#> SelectBoost-style quantile regression sketch
#>   tau: 0.5 
#>   perturbation replicates: 6 
#>   c0 thresholds: 5 
#>   predictors: 20 
#>   grouping: group_neighbors 
#>   max group size: 10 
#>   screening: none 
#>   stability selection: 4 draws at fraction 0.5 (complementary pairs) 
#>   tuned lambda factor: 0.7789 (cv, one_se)
#>   top mean selection frequencies:
#>    x2    x1    x3    x4   x17    x5 
#> 0.883 0.875 0.771 0.688 0.583 0.579
summary(fit)
#> Tau: 0.5 
#> Stable support threshold: 0.55 
#> Selection metric: hybrid 
#> Variables above the threshold:
#> [1] "x2" "x1" "x3"
#> Top summary scores:
#>    x2    x1    x3    x4   x13   x17    x5   x14   x12   x10 
#> 0.871 0.863 0.686 0.513 0.189 0.164 0.080 0.069 0.044 0.022
support_selectboost_quantile(fit)
#> [1] "x2" "x1" "x3"

plot(fit)

Selection-frequency paths for the six variables with the highest mean selection frequency.

Example: formula interface and multiple quantiles

dat <- data.frame(y = sim$y, sim$x)

fit_formula <- selectboost_quantile(
  y ~ .,
  data = dat,
  tau = c(0.25, 0.5, 0.75),
  B = 4,
  step_num = 0.5,
  tune_lambda = "bic",
  seed = 2,
  verbose = FALSE
)

print(fit_formula)
#> SelectBoost-style quantile regression sketch
#>   tau: 0.25, 0.50, 0.75 
#>   perturbation replicates: 4 
#>   c0 thresholds: 5 
#>   predictors: 20 
#>   grouping: group_neighbors 
#>   screening: none 
#>   tuned lambda factors: 1.0000, 0.5322, 1.0000 
#>  tau = 0.25: top mean selection frequencies
#>   x1   x2   x3  x20 
#> 0.95 0.85 0.85 0.85 
#>  tau = 0.5: top mean selection frequencies
#>  x17   x3   x4   x5 
#> 0.95 0.90 0.90 0.90 
#>  tau = 0.75: top mean selection frequencies
#>   x1   x3   x4  x16 
#> 0.85 0.80 0.80 0.80
summary(fit_formula)
#> SelectBoost quantile summary
#>   tau values: 0.25, 0.50, 0.75 
#>   selection metric: hybrid 
#>  tau = 0.25: 3 variables above threshold
#>  tau = 0.5: 3 variables above threshold
#>  tau = 0.75: 4 variables above threshold

Penalty tuning

The package exposes the penalty-tuning stage directly through tune_lambda_quantile(). This is useful when the tuning decision itself needs to be inspected.

tuned <- tune_lambda_quantile(
  sim$x,
  sim$y,
  tau = 0.5,
  method = "cv",
  rule = "one_se",
  lambda_inflation = 1.25,
  nlambda = 6,
  folds = 3,
  repeats = 2,
  seed = 3,
  verbose = FALSE
)

print(tuned)
#> Quantile-lasso tuning
#>   tau: 0.5 
#>   method: cv 
#>   rule: one_se 
#>   lambda inflation: 1.25 
#>   folds: 3 
#>   repeats: 2 
#>   selected factor: 0.6866 
#>   score: 0.49975 
#>   standard error: 0.012345
summary(tuned)
#>   tau     factor     score           se   rule lambda_inflation selected
#> 1 0.5 1.00000000 0.5243058 0.0007634017 one_se             1.25    FALSE
#> 2 0.5 0.54928027 0.4997532 0.0123450390 one_se             1.25     TRUE
#> 3 0.5 0.30170882 0.5193880 0.0048581777 one_se             1.25    FALSE
#> 4 0.5 0.16572270 0.5402414 0.0016910207 one_se             1.25    FALSE
#> 5 0.5 0.09102821 0.5487809 0.0123171347 one_se             1.25    FALSE
#> 6 0.5 0.05000000 0.5514630 0.0199834361 one_se             1.25    FALSE

Validation workflow

The package includes a simulation and benchmark framework to compare plain quantile lasso, tuned quantile lasso, and the SelectBoost-based quantile workflow. The shipped validation study now reports recall, FDR, and F1 score.

scenarios <- default_quantile_benchmark_scenarios(
  tau = c(0.25, 0.5),
  regimes = c("moderate_corr", "high_dim")
)

bench <- benchmark_quantile_selection(
  scenarios = scenarios,
  replications = 5,
  threshold = 0.55,
  seed = 1,
  verbose = TRUE
)

summary(bench)

For a reproducible study from a source checkout, run:

out_dir <- file.path(tempdir(), "SelectBoost.quantile-validation")
system2(
  "Rscript",
  c("inst/scripts/run_quantile_benchmark.R", out_dir, "4", "0.55")
)

The package ships both a getting-started vignette and a validation vignette:

vignette("getting-started", package = "SelectBoost.quantile")
vignette("validation-study", package = "SelectBoost.quantile")