Skip to contents

Bootstraps the dataset B times and records how often each variable is selected by each selector. Observations containing NA in either X or Y are removed prior to resampling. Column names are abbreviated internally and mapped back to the originals in the output just like in compare_selectors_single().

Usage

compare_selectors_bootstrap(X, Y, B = 50, include_enet = TRUE, seed = NULL)

Arguments

X

Numeric matrix (n × p) of mean-submodel predictors.

Y

Numeric response in (0,1). Values are squeezed to (0,1) internally.

B

Number of bootstrap replications.

include_enet

Logical; include ENet if gamlss.lasso is installed.

seed

Optional RNG seed.

Value

Long data frame with columns selector, variable, freq in [0,1], n_success, and n_fail. The freq column reports the share of bootstrap replicates where a variable was selected by the corresponding selector. Values near 1 signal high stability whereas small values indicate weak evidence. n_success counts the successful fits contributing to the frequency estimate (excluding failed replicates), while n_fail records the number of unsuccessful fits. A "failures" attribute attached to the returned data frame lists the replicate indices and messages for any encountered errors.

Examples

set.seed(1)
X <- matrix(rnorm(300), 100, 3); Y <- plogis(X[, 1])
Y <- rbeta(100, Y * 30, (1 - Y) * 30)
freq <- compare_selectors_bootstrap(X, Y, B = 10, include_enet = FALSE)
head(freq)
#>     selector variable freq n_success n_fail
#> X1       AIC       X1  1.0        10      0
#> X2       AIC       X2  0.0        10      0
#> X3       AIC       X3  0.5        10      0
#> X11      BIC       X1  1.0        10      0
#> X21      BIC       X2  0.0        10      0
#> X31      BIC       X3  0.0        10      0
subset(freq, freq > 0.8)
#>     selector variable freq n_success n_fail
#> X1       AIC       X1    1        10      0
#> X11      BIC       X1    1        10      0
#> X12     AICc       X1    1        10      0
#> X13    LASSO       X1    1        10      0
#> X23    LASSO       X2    1        10      0
#> X33    LASSO       X3    1        10      0
#> X14   GLMNET       X1    1        10      0

# \donttest{
# Increase B until the reported frequencies stabilise. For example,
freq_big <- compare_selectors_bootstrap(X, Y, B = 200, include_enet = FALSE)
stats::aggregate(freq ~ selector, freq_big, summary)
#>   selector freq.Min. freq.1st Qu. freq.Median freq.Mean freq.3rd Qu. freq.Max.
#> 1      AIC 0.1700000    0.1750000   0.1800000 0.4500000    0.5900000 1.0000000
#> 2     AICc 0.1150000    0.1750000   0.2350000 0.4500000    0.6175000 1.0000000
#> 3      BIC 0.0150000    0.0350000   0.0550000 0.3566667    0.5275000 1.0000000
#> 4   GLMNET 0.1100000    0.1575000   0.2050000 0.4383333    0.6025000 1.0000000
#> 5    LASSO 1.0000000    1.0000000   1.0000000 1.0000000    1.0000000 1.0000000
# }