SelectBoost-style quantile regression — selectboost

selectboost_quantile() adapts the core SelectBoost workflow to sparse quantile regression:

Usage

selectboost_quantile(
  x,
  y = NULL,
  tau = 0.5,
  B = 50,
  c0_seq = NULL,
  step_num = 0.1,
  group = group_neighbors,
  max_group_size = NULL,
  screen = c("auto", "none", "quantile_rank"),
  screen_size = NULL,
  lambda = NULL,
  tune_lambda = c("none", "cv", "bic"),
  lambda_rule = c("min", "one_se"),
  lambda_factors = NULL,
  lambda_inflation = 1,
  nlambda = 20,
  lambda_min_ratio = 0.05,
  folds = 5,
  repeats = 1,
  subsamples = 1,
  sample_fraction = 0.5,
  complementary_pairs = FALSE,
  selector = quantile_lasso_selector,
  standardize = TRUE,
  eps = 1e-06,
  seed = NULL,
  data = NULL,
  subset = NULL,
  na.action = stats::na.fail,
  verbose = interactive(),
  ...
)

Arguments

x: Numeric design matrix or a formula.
y: Numeric response vector when x is a matrix.
tau: Quantile level in (0, 1). Can be a vector.
B: Number of perturbation replicates for each c0 threshold.
c0_seq: Optional decreasing sequence of correlation thresholds. When NULL, it is computed from empirical correlation quantiles using step_num.
step_num: Step size used to build the default c0 path.
group: Grouping rule used to convert the absolute correlation matrix and threshold c0 into a list of neighborhoods, one per variable. Can be a function or the name of one. Functions must accept (abs_corr, c0).
max_group_size: Optional cap on the size of each correlation neighborhood. When supplied, only the strongest absolute correlations are retained within each variable's group.
screen: Screening rule applied before the SelectBoost loop. "auto" enables tau-aware rank screening when p > n, "none" disables screening, and "quantile_rank" always uses the built-in rank-score screen. Functions must accept (x, y, tau, screen_size).
screen_size: Optional number of predictors retained after screening.
lambda: Optional lasso penalty supplied to quantreg::rq.fit.lasso(). A scalar applies a common slope penalty, while a full penalty vector can also be supplied. When tau has length greater than one, lambda can also be a list with one entry per tau.
tune_lambda: One of "none", "cv", or "bic". When not "none", the package tunes a penalty profile once on the original design and reuses it for all perturbations.
lambda_rule: Selection rule used after tuning. "min" takes the best tuning score, while "one_se" applies the one-standard-error rule when tune_lambda = "cv".
lambda_factors: Optional positive multipliers applied to the default quantile-lasso penalty profile during tuning.
lambda_inflation: Optional multiplier applied after tuning to favor a stronger selection penalty.
nlambda: Number of tuning candidates when lambda_factors is NULL.
lambda_min_ratio: Smallest tuning multiplier used to generate the default tuning grid.
folds: Number of cross-validation folds when tune_lambda = "cv".
repeats: Number of repeated fold assignments when tune_lambda = "cv".
subsamples: Number of subsample draws used for stability selection. Values greater than one aggregate selection frequencies across subsamples.
sample_fraction: Fraction of observations drawn in each subsample when subsamples > 1.
complementary_pairs: Should subsamples be generated as complementary pairs?
selector: Function used to fit the sparse quantile model. It must accept (x, y, tau, lambda, ...) and return a named coefficient vector including an intercept.
standardize: Should the selector be fitted on the SelectBoost-normalized design? When TRUE, columns are centered and scaled to unit Euclidean norm before fitting, matching the original package. When FALSE, perturbations are still generated in the normalized space but mapped back to the original scale before model fitting.
eps: Numerical tolerance used to turn coefficients into selections.
seed: Optional random seed for reproducible perturbations and tuning.
data: Optional data frame used when x is a formula.
subset: Optional subset expression used with the formula interface.
na.action: Missing-data handler used with the formula interface.
verbose: Should the routine report progress?
...: Additional arguments forwarded to selector.

Value

An object of class "selectboost_quantile" with components: frequencies, baseline, baseline_standardized, c0_seq, tau, B, lambda, lambda_tuning, call, and preprocessing metadata.

Details

build a centered, unit-norm design as in SelectBoost::boost.normalize(),
compute correlation neighborhoods along a c0 path,
fit a directional distribution to each variable's sign-aligned neighborhood in the sample hyperplane,
draw perturbed predictors from those fitted directional models,
refit penalized quantile regression and aggregate selection frequencies.

This version keeps the public API stable while separating the internals into explicit preprocessing, grouping, directional perturbation, and tuning stages.

Examples

sim <- simulate_quantile_data(n = 80, p = 12, active = 1:3, seed = 1)
fit <- selectboost_quantile(sim$x, sim$y, tau = 0.5, B = 8, seed = 1)
print(fit)
#> SelectBoost-style quantile regression sketch
#>   tau: 0.5 
#>   perturbation replicates: 8 
#>   c0 thresholds: 13 
#>   predictors: 12 
#>   grouping: group_neighbors 
#>   screening: none 
#>   lambda: vector[13], range [0.0000, 0.0000] 
#>   top mean selection frequencies:
#>    x1    x3    x2    x7    x9   x12 
#> 0.856 0.702 0.683 0.615 0.615 0.615 
summary(fit, threshold = 0.6)
#> Tau: 0.5 
#> Stable support threshold: 0.6 
#> Selection metric: hybrid 
#> Variables above the threshold:
#> [1] "x1"
#> Top summary scores:
#>    x1    x3    x2   x12    x9   x10    x7    x5    x4    x6 
#> 0.817 0.365 0.245 0.173 0.142 0.112 0.071 0.061 0.000 0.000 

dat <- data.frame(y = sim$y, sim$x)
fit_formula <- selectboost_quantile(
  y ~ .,
  data = dat,
  tau = 0.5,
  B = 4,
  step_num = 0.5,
  seed = 1
)