Benchmark scaling across data volumes for bigANNOY and direct RcppAnnoy
Source:R/benchmark_interface.R
benchmark_annoy_volume_suite.RdRun benchmark_annoy_vs_rcppannoy() over a grid of synthetic data sizes to
study how build time, search time, and index size scale with data volume.
Usage
benchmark_annoy_volume_suite(
n_ref = c(2000L, 5000L, 10000L),
n_query = 200L,
n_dim = c(20L, 50L),
k = 10L,
n_trees = 50L,
metric = "euclidean",
search_k = -1L,
seed = 42L,
build_seed = seed,
build_threads = -1L,
block_size = annoy_default_block_size(),
backend = getOption("bigANNOY.backend", "cpp"),
exact = FALSE,
filebacked = FALSE,
path_dir = tempdir(),
keep_files = FALSE,
output_path = NULL,
load_mode = "eager"
)Arguments
- n_ref
Integer vector of synthetic reference row counts.
- n_query
Integer vector of synthetic query row counts.
- n_dim
Integer vector of synthetic column counts.
- k
Number of neighbours to return.
- n_trees
Number of Annoy trees to build.
- metric
Annoy metric. One of
"euclidean","angular","manhattan", or"dot".- search_k
Annoy search budget.
- seed
Random seed used for synthetic data generation and, by default, for the Annoy build seed.
- build_seed
Optional Annoy build seed. Defaults to
seed.- build_threads
Native Annoy build-thread setting.
- block_size
Build/search block size.
- backend
Requested bigANNOY backend.
- exact
Logical flag controlling whether to benchmark the exact Euclidean baseline with
bigKNNwhen available.- filebacked
Logical flag; if
TRUE, synthetic or dense reference inputs are converted into file-backedbig.matrixobjects before build.- path_dir
Directory where temporary Annoy and optional file-backed benchmark files should be written.
- keep_files
Logical flag; if
TRUE, leave the generated Annoy index on disk after the benchmark finishes.- output_path
Optional CSV path where the benchmark summary should be written.
- load_mode
Whether the benchmarked index should be returned metadata-only until first search (
"lazy") or eagerly loaded once built ("eager").