Bootstrap selection frequencies across selectors
Source:R/compare_helpers.R
compare_selectors_bootstrap.RdBootstraps the dataset B times and records how often each variable is
selected by each selector. Observations containing NA in either X or Y
are removed prior to resampling. Column names are abbreviated internally and
mapped back to the originals in the output just like in
compare_selectors_single().
Value
Long data frame with columns selector, variable, freq in [0,1],
n_success, and n_fail. The freq column reports the share of bootstrap
replicates where a variable was selected by the corresponding selector.
Values near 1 signal high stability whereas small values indicate weak
evidence. n_success counts the successful fits contributing to the
frequency estimate (excluding failed replicates), while n_fail records the
number of unsuccessful fits. A "failures" attribute attached to the
returned data frame lists the replicate indices and messages for any
encountered errors.
Examples
set.seed(1)
X <- matrix(rnorm(300), 100, 3); Y <- plogis(X[, 1])
Y <- rbeta(100, Y * 30, (1 - Y) * 30)
freq <- compare_selectors_bootstrap(X, Y, B = 10, include_enet = FALSE)
head(freq)
#> selector variable freq n_success n_fail
#> X1 AIC X1 1.0 10 0
#> X2 AIC X2 0.0 10 0
#> X3 AIC X3 0.5 10 0
#> X11 BIC X1 1.0 10 0
#> X21 BIC X2 0.0 10 0
#> X31 BIC X3 0.0 10 0
subset(freq, freq > 0.8)
#> selector variable freq n_success n_fail
#> X1 AIC X1 1 10 0
#> X11 BIC X1 1 10 0
#> X12 AICc X1 1 10 0
#> X13 LASSO X1 1 10 0
#> X23 LASSO X2 1 10 0
#> X33 LASSO X3 1 10 0
#> X14 GLMNET X1 1 10 0
# \donttest{
# Increase B until the reported frequencies stabilise. For example,
freq_big <- compare_selectors_bootstrap(X, Y, B = 200, include_enet = FALSE)
stats::aggregate(freq ~ selector, freq_big, summary)
#> selector freq.Min. freq.1st Qu. freq.Median freq.Mean freq.3rd Qu. freq.Max.
#> 1 AIC 0.1700000 0.1750000 0.1800000 0.4500000 0.5900000 1.0000000
#> 2 AICc 0.1150000 0.1750000 0.2350000 0.4500000 0.6175000 1.0000000
#> 3 BIC 0.0150000 0.0350000 0.0550000 0.3566667 0.5275000 1.0000000
#> 4 GLMNET 0.1100000 0.1575000 0.2050000 0.4383333 0.6025000 1.0000000
#> 5 LASSO 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000
# }