SelectBoost for beta-regression models — sb

sb_beta() orchestrates all SelectBoost stages—normalisation, correlation analysis, grouping, correlated resampling, and stability tallying—while using the beta-regression selectors provided by this package. It can operate on point-valued or interval-valued responses and automatically squeezes the outcome into (0, 1) unless instructed otherwise.

Usage

sb_beta(
  X,
  Y = NULL,
  selector = betareg_step_aic,
  corrfunc = "cor",
  B = 100,
  step.num = 0.1,
  steps.seq = NULL,
  version = c("glmnet", "lars"),
  squeeze = TRUE,
  use.parallel = FALSE,
  seed = NULL,
  verbose = FALSE,
  threshold = 1e-04,
  interval = c("none", "uniform", "midpoint"),
  Y_low = NULL,
  Y_high = NULL,
  ...
)

Arguments

X: Numeric design matrix. Coerced with as.matrix() and normalised via sb_normalize().
Y: Numeric response vector. Values are squeezed to the open unit interval with the standard SelectBoost transformation unless squeeze = FALSE. Optional when interval bounds are supplied.
selector: Selection routine. Defaults to betareg_step_aic(). Function or character string. If it is a function, the selector name should be added as the fun.name attribute.
corrfunc: Correlation function passed to sb_compute_corr().
B: Number of replicates to generate.
step.num: Step length for the automatically generated c0 grid.
steps.seq: Optional user-supplied grid of absolute correlation thresholds.
version: Either "glmnet" (intercept in first row) or "lars".
squeeze: Logical; ensure the response lies in (0, 1).
use.parallel: Logical; enable parallel resampling and selector fits when supported by the current R session.
seed: Optional integer seed for reproducibility. The seed is scoped via withr::with_seed() so the caller's RNG state is restored on exit.
verbose: Logical; emit progress messages.
threshold: Numeric tolerance for considering a coefficient selected.
interval: Interval-resampling mode: "none" reuses Y, whereas "uniform" and "midpoint" draw pseudo-responses between Y_low and Y_high for each replicate.
Y_low, Y_high: Interval bounds in [0, 1] paired with the rows of X when interval is not "none".
...: Additional arguments forwarded to selector.

Value

Matrix of selection frequencies with one row per c0 level and class "sb_beta". See Details for the recorded attributes.

Details

The returned object carries a rich set of attributes:

"c0.seq" – the grid of absolute-correlation thresholds explored during resampling.
"steps.seq" – the raw sequence (if any) used to construct the grid.
"selector" – the selector identifier (function name or expression).
"B" – number of resampled designs passed to the selector.
"interval" – the interval sampling mode ("none", "uniform", or "midpoint").
"resample_diagnostics" – per-threshold data frames with summary statistics on the cached correlated draws.

These attributes mirror the historical SelectBoost beta implementation so the object can be consumed by existing plotting and reporting utilities.

Examples

set.seed(42)
sim <- simulation_DATA.beta(n = 80, p = 4, s = 2)
# increase B for real applications
res <- sb_beta(sim$X, sim$Y, B = 5)
res
#> SelectBoost beta selection frequencies
#> Selector: betareg_step_aic
#> Resamples per threshold: 5
#> Interval mode: none
#> c0 grid: 1.000, 0.126, 0.113, 0.025, 0.000
#> Inner thresholds: 0.126, 0.113, 0.025
#>             x1  x2  x3  x4 phi|(Intercept)
#> c0 = 1.000 1.0 1.0 0.0 0.0               1
#> c0 = 0.126 0.4 1.0 0.0 0.0               1
#> c0 = 0.113 0.2 0.0 0.0 0.2               1
#> c0 = 0.025 0.2 0.2 0.0 0.2               1
#> c0 = 0.000 0.2 0.2 0.4 0.4               1
#> attr(,"c0.seq")
#> [1] 1.00000000 0.12616304 0.11291203 0.02454839 0.00000000
#> attr(,"steps.seq")
#> [1] 0.12616304 0.11291203 0.02454839
#> attr(,"B")
#> [1] 5
#> attr(,"selector")
#> [1] "betareg_step_aic"
#> attr(,"resample_diagnostics")
#> attr(,"resample_diagnostics")$`c0 = 1.000`
#> [1] group                   size                    regenerated            
#> [4] cached                  mean_abs_corr_orig      mean_abs_corr_surrogate
#> [7] mean_abs_corr_cross    
#> <0 rows> (or 0-length row.names)
#> 
#> attr(,"resample_diagnostics")$`c0 = 0.126`
#>      group size regenerated cached mean_abs_corr_orig mean_abs_corr_surrogate
#> 1    x1,x4    2           5  FALSE          0.1275181               0.1112407
#> 2    x3,x4    2           5  FALSE          0.1261630               0.1264908
#> 3 x1,x3,x4    3           5  FALSE          0.1225041               0.1517253
#>   mean_abs_corr_cross
#> 1          0.08777970
#> 2          0.11906132
#> 3          0.07866331
#> 
#> attr(,"resample_diagnostics")$`c0 = 0.113`
#>         group size regenerated cached mean_abs_corr_orig
#> 1 x1,x2,x3,x4    4           5  FALSE         0.08541194
#> 2       x1,x2    2           5  FALSE         0.11291203
#> 3    x1,x3,x4    3           0   TRUE         0.12250411
#>   mean_abs_corr_surrogate mean_abs_corr_cross
#> 1              0.13775478          0.11347522
#> 2              0.08833692          0.05156172
#> 3              0.15172531          0.07866331
#> 
#> attr(,"resample_diagnostics")$`c0 = 0.025`
#>         group size regenerated cached mean_abs_corr_orig
#> 1 x1,x2,x3,x4    4           0   TRUE         0.08541194
#> 2    x1,x2,x4    3           5  FALSE         0.08832617
#> 3    x1,x3,x4    3           0   TRUE         0.12250411
#>   mean_abs_corr_surrogate mean_abs_corr_cross
#> 1               0.1377548          0.11347522
#> 2               0.1198411          0.10854057
#> 3               0.1517253          0.07866331
#> 
#> attr(,"resample_diagnostics")$`c0 = 0.000`
#>         group size regenerated cached mean_abs_corr_orig
#> 1 x1,x2,x3,x4    4           0   TRUE         0.08541194
#>   mean_abs_corr_surrogate mean_abs_corr_cross
#> 1               0.1377548           0.1134752
#> 
#> attr(,"interval")
#> [1] "none"