Skip to contents

selectboost_quantile() adapts the core SelectBoost workflow to sparse quantile regression:

Usage

selectboost_quantile(
  x,
  y = NULL,
  tau = 0.5,
  B = 50,
  c0_seq = NULL,
  step_num = 0.1,
  group = group_neighbors,
  max_group_size = NULL,
  screen = c("auto", "none", "quantile_rank"),
  screen_size = NULL,
  lambda = NULL,
  tune_lambda = c("none", "cv", "bic"),
  lambda_rule = c("min", "one_se"),
  lambda_factors = NULL,
  lambda_inflation = 1,
  nlambda = 20,
  lambda_min_ratio = 0.05,
  folds = 5,
  repeats = 1,
  subsamples = 1,
  sample_fraction = 0.5,
  complementary_pairs = FALSE,
  selector = quantile_lasso_selector,
  standardize = TRUE,
  eps = 1e-06,
  seed = NULL,
  data = NULL,
  subset = NULL,
  na.action = stats::na.fail,
  verbose = interactive(),
  ...
)

Arguments

x

Numeric design matrix or a formula.

y

Numeric response vector when x is a matrix.

tau

Quantile level in (0, 1). Can be a vector.

B

Number of perturbation replicates for each c0 threshold.

c0_seq

Optional decreasing sequence of correlation thresholds. When NULL, it is computed from empirical correlation quantiles using step_num.

step_num

Step size used to build the default c0 path.

group

Grouping rule used to convert the absolute correlation matrix and threshold c0 into a list of neighborhoods, one per variable. Can be a function or the name of one. Functions must accept (abs_corr, c0).

max_group_size

Optional cap on the size of each correlation neighborhood. When supplied, only the strongest absolute correlations are retained within each variable's group.

screen

Screening rule applied before the SelectBoost loop. "auto" enables tau-aware rank screening when p > n, "none" disables screening, and "quantile_rank" always uses the built-in rank-score screen. Functions must accept (x, y, tau, screen_size).

screen_size

Optional number of predictors retained after screening.

lambda

Optional lasso penalty supplied to quantreg::rq.fit.lasso(). A scalar applies a common slope penalty, while a full penalty vector can also be supplied. When tau has length greater than one, lambda can also be a list with one entry per tau.

tune_lambda

One of "none", "cv", or "bic". When not "none", the package tunes a penalty profile once on the original design and reuses it for all perturbations.

lambda_rule

Selection rule used after tuning. "min" takes the best tuning score, while "one_se" applies the one-standard-error rule when tune_lambda = "cv".

lambda_factors

Optional positive multipliers applied to the default quantile-lasso penalty profile during tuning.

lambda_inflation

Optional multiplier applied after tuning to favor a stronger selection penalty.

nlambda

Number of tuning candidates when lambda_factors is NULL.

lambda_min_ratio

Smallest tuning multiplier used to generate the default tuning grid.

folds

Number of cross-validation folds when tune_lambda = "cv".

repeats

Number of repeated fold assignments when tune_lambda = "cv".

subsamples

Number of subsample draws used for stability selection. Values greater than one aggregate selection frequencies across subsamples.

sample_fraction

Fraction of observations drawn in each subsample when subsamples > 1.

complementary_pairs

Should subsamples be generated as complementary pairs?

selector

Function used to fit the sparse quantile model. It must accept (x, y, tau, lambda, ...) and return a named coefficient vector including an intercept.

standardize

Should the selector be fitted on the SelectBoost-normalized design? When TRUE, columns are centered and scaled to unit Euclidean norm before fitting, matching the original package. When FALSE, perturbations are still generated in the normalized space but mapped back to the original scale before model fitting.

eps

Numerical tolerance used to turn coefficients into selections.

seed

Optional random seed for reproducible perturbations and tuning.

data

Optional data frame used when x is a formula.

subset

Optional subset expression used with the formula interface.

na.action

Missing-data handler used with the formula interface.

verbose

Should the routine report progress?

...

Additional arguments forwarded to selector.

Value

An object of class "selectboost_quantile" with components: frequencies, baseline, baseline_standardized, c0_seq, tau, B, lambda, lambda_tuning, call, and preprocessing metadata.

Details

  1. build a centered, unit-norm design as in SelectBoost::boost.normalize(),

  2. compute correlation neighborhoods along a c0 path,

  3. fit a directional distribution to each variable's sign-aligned neighborhood in the sample hyperplane,

  4. draw perturbed predictors from those fitted directional models,

  5. refit penalized quantile regression and aggregate selection frequencies.

This version keeps the public API stable while separating the internals into explicit preprocessing, grouping, directional perturbation, and tuning stages.

Examples

sim <- simulate_quantile_data(n = 80, p = 12, active = 1:3, seed = 1)
fit <- selectboost_quantile(sim$x, sim$y, tau = 0.5, B = 8, seed = 1)
print(fit)
#> SelectBoost-style quantile regression sketch
#>   tau: 0.5 
#>   perturbation replicates: 8 
#>   c0 thresholds: 13 
#>   predictors: 12 
#>   grouping: group_neighbors 
#>   screening: none 
#>   lambda: vector[13], range [0.0000, 0.0000] 
#>   top mean selection frequencies:
#>    x1    x3    x2    x7    x9   x12 
#> 0.856 0.702 0.683 0.615 0.615 0.615 
summary(fit, threshold = 0.6)
#> Tau: 0.5 
#> Stable support threshold: 0.6 
#> Selection metric: hybrid 
#> Variables above the threshold:
#> [1] "x1"
#> Top summary scores:
#>    x1    x3    x2   x12    x9   x10    x7    x5    x4    x6 
#> 0.817 0.365 0.245 0.173 0.142 0.112 0.071 0.061 0.000 0.000 

dat <- data.frame(y = sim$y, sim$x)
fit_formula <- selectboost_quantile(
  y ~ .,
  data = dat,
  tau = 0.5,
  B = 4,
  step_num = 0.5,
  seed = 1
)