Skip to contents

Run the same Annoy build and search task through bigANNOY and through a direct dense RcppAnnoy baseline. The comparison reports both speed metrics and data-volume metrics such as reference bytes, query bytes, and generated index size.

Usage

benchmark_annoy_vs_rcppannoy(
  x = NULL,
  query = NULL,
  n_ref = 2000L,
  n_query = 200L,
  n_dim = 20L,
  k = 10L,
  n_trees = 50L,
  metric = "euclidean",
  search_k = -1L,
  seed = 42L,
  build_seed = seed,
  build_threads = -1L,
  block_size = annoy_default_block_size(),
  backend = getOption("bigANNOY.backend", "cpp"),
  exact = TRUE,
  filebacked = FALSE,
  path_dir = tempdir(),
  keep_files = FALSE,
  output_path = NULL,
  load_mode = "eager"
)

Arguments

x

Optional benchmark reference input. Supply NULL to generate a synthetic reference matrix, or provide a numeric matrix, big.matrix, descriptor, descriptor path, or external pointer.

query

Optional benchmark query input. Supply NULL for self-search, or provide a numeric matrix, big.matrix, descriptor, descriptor path, or external pointer.

n_ref

Number of synthetic reference rows to generate when x = NULL.

n_query

Number of synthetic query rows to generate when x = NULL and query is not NULL.

n_dim

Number of synthetic columns to generate when x = NULL.

k

Number of neighbours to return.

n_trees

Number of Annoy trees to build.

metric

Annoy metric. One of "euclidean", "angular", "manhattan", or "dot".

search_k

Annoy search budget.

seed

Random seed used for synthetic data generation and, by default, for the Annoy build seed.

build_seed

Optional Annoy build seed. Defaults to seed.

build_threads

Native Annoy build-thread setting.

block_size

Build/search block size.

backend

Requested bigANNOY backend.

exact

Logical flag controlling whether to benchmark the exact Euclidean baseline with bigKNN when available.

filebacked

Logical flag; if TRUE, synthetic or dense reference inputs are converted into file-backed big.matrix objects before build.

path_dir

Directory where temporary Annoy and optional file-backed benchmark files should be written.

keep_files

Logical flag; if TRUE, leave the generated Annoy index on disk after the benchmark finishes.

output_path

Optional CSV path where the benchmark summary should be written.

load_mode

Whether the benchmarked index should be returned metadata-only until first search ("lazy") or eagerly loaded once built ("eager").

Value

A list with a two-row summary data frame, one row for bigANNOY and one for direct RcppAnnoy, plus benchmark metadata and any validation report produced for the bigANNOY index.