Frédéric Bertrand
SelectBoost.quantile adapts the SelectBoost idea to sparse quantile regression. The package builds correlation neighborhoods, perturbs correlated predictors with a directional sampler inspired by the original SelectBoost internals, refits penalized quantile regression models on the perturbed designs, and aggregates variable-selection frequencies across a path of correlation thresholds.
The current package already supports:
- matrix and formula interfaces,
- single- and multi-
tauworkflows, - tau-aware screening for
p > nsettings, - stronger penalty tuning through one-standard-error selection and lambda inflation,
- complementary-pairs stability selection,
- neighborhood capping for high-dimensional correlated designs,
- hybrid support extraction that combines path stability and fitted effect size,
- benchmark helpers for validation studies.
The package should still be read as a methodological prototype: the workflow is usable and documented, but the quantile-regression adaptation is still being validated and refined.
Start with vignette("getting-started", package = "SelectBoost.quantile") for the main workflow, then use vignette("validation-study", package = "SelectBoost.quantile") for the shipped benchmark comparisons.
Installation
You can install the development version of SelectBoost.quantile from GitHub with:
remotes::install_github("fbertran/SelectBoost.quantile")From a local source checkout, use:
devtools::install()Main workflow
For each correlation threshold c0, selectboost_quantile():
- normalizes the design as in the original
SelectBoostworkflow, - builds correlation neighborhoods,
- fits a sign-aligned directional model to each neighborhood,
- perturbs the predictors in the sample hyperplane,
- refits sparse quantile regression,
- aggregates the selection frequencies across perturbations and stability subsamples.
The resulting frequency path can then be summarized, plotted, or thresholded into a stable support.
Example: matrix interface
library(SelectBoost.quantile)
sim <- simulate_quantile_data(
n = 100,
p = 20,
active = 1:4,
rho = 0.7,
seed = 1
)
fit <- selectboost_quantile(
sim$x,
sim$y,
tau = 0.5,
B = 6,
step_num = 0.5,
screen = "auto",
tune_lambda = "cv",
lambda_rule = "one_se",
lambda_inflation = 1.25,
subsamples = 4,
sample_fraction = 0.5,
complementary_pairs = TRUE,
max_group_size = 10,
seed = 1,
verbose = FALSE
)
print(fit)
#> SelectBoost-style quantile regression sketch
#> tau: 0.5
#> perturbation replicates: 6
#> c0 thresholds: 5
#> predictors: 20
#> grouping: group_neighbors
#> max group size: 10
#> screening: none
#> stability selection: 4 draws at fraction 0.5 (complementary pairs)
#> tuned lambda factor: 0.7789 (cv, one_se)
#> top mean selection frequencies:
#> x2 x1 x3 x4 x17 x5
#> 0.883 0.875 0.771 0.688 0.583 0.579
summary(fit)
#> Tau: 0.5
#> Stable support threshold: 0.55
#> Selection metric: hybrid
#> Variables above the threshold:
#> [1] "x2" "x1" "x3"
#> Top summary scores:
#> x2 x1 x3 x4 x13 x17 x5 x14 x12 x10
#> 0.871 0.863 0.686 0.513 0.189 0.164 0.080 0.069 0.044 0.022
support_selectboost_quantile(fit)
#> [1] "x2" "x1" "x3"
plot(fit)
Selection-frequency paths for the six variables with the highest mean selection frequency.
Example: formula interface and multiple quantiles
dat <- data.frame(y = sim$y, sim$x)
fit_formula <- selectboost_quantile(
y ~ .,
data = dat,
tau = c(0.25, 0.5, 0.75),
B = 4,
step_num = 0.5,
tune_lambda = "bic",
seed = 2,
verbose = FALSE
)
print(fit_formula)
#> SelectBoost-style quantile regression sketch
#> tau: 0.25, 0.50, 0.75
#> perturbation replicates: 4
#> c0 thresholds: 5
#> predictors: 20
#> grouping: group_neighbors
#> screening: none
#> tuned lambda factors: 1.0000, 0.5322, 1.0000
#> tau = 0.25: top mean selection frequencies
#> x1 x2 x3 x20
#> 0.95 0.85 0.85 0.85
#> tau = 0.5: top mean selection frequencies
#> x17 x3 x4 x5
#> 0.95 0.90 0.90 0.90
#> tau = 0.75: top mean selection frequencies
#> x1 x3 x4 x16
#> 0.85 0.80 0.80 0.80
summary(fit_formula)
#> SelectBoost quantile summary
#> tau values: 0.25, 0.50, 0.75
#> selection metric: hybrid
#> tau = 0.25: 3 variables above threshold
#> tau = 0.5: 3 variables above threshold
#> tau = 0.75: 4 variables above thresholdPenalty tuning
The package exposes the penalty-tuning stage directly through tune_lambda_quantile(). This is useful when the tuning decision itself needs to be inspected.
tuned <- tune_lambda_quantile(
sim$x,
sim$y,
tau = 0.5,
method = "cv",
rule = "one_se",
lambda_inflation = 1.25,
nlambda = 6,
folds = 3,
repeats = 2,
seed = 3,
verbose = FALSE
)
print(tuned)
#> Quantile-lasso tuning
#> tau: 0.5
#> method: cv
#> rule: one_se
#> lambda inflation: 1.25
#> folds: 3
#> repeats: 2
#> selected factor: 0.6866
#> score: 0.49975
#> standard error: 0.012345
summary(tuned)
#> tau factor score se rule lambda_inflation selected
#> 1 0.5 1.00000000 0.5243058 0.0007634017 one_se 1.25 FALSE
#> 2 0.5 0.54928027 0.4997532 0.0123450390 one_se 1.25 TRUE
#> 3 0.5 0.30170882 0.5193880 0.0048581777 one_se 1.25 FALSE
#> 4 0.5 0.16572270 0.5402414 0.0016910207 one_se 1.25 FALSE
#> 5 0.5 0.09102821 0.5487809 0.0123171347 one_se 1.25 FALSE
#> 6 0.5 0.05000000 0.5514630 0.0199834361 one_se 1.25 FALSEValidation workflow
The package includes a simulation and benchmark framework to compare plain quantile lasso, tuned quantile lasso, and the SelectBoost-based quantile workflow. The shipped validation study now reports recall, FDR, and F1 score.
scenarios <- default_quantile_benchmark_scenarios(
tau = c(0.25, 0.5),
regimes = c("moderate_corr", "high_dim")
)
bench <- benchmark_quantile_selection(
scenarios = scenarios,
replications = 5,
threshold = 0.55,
seed = 1,
verbose = TRUE
)
summary(bench)For a reproducible study from a source checkout, run:
out_dir <- file.path(tempdir(), "SelectBoost.quantile-validation")
system2(
"Rscript",
c("inst/scripts/run_quantile_benchmark.R", out_dir, "4", "0.55")
)The package ships both a getting-started vignette and a validation vignette:
