Skip to contents

Run a grid of n_trees and search_k settings on the same benchmark dataset, optionally recording recall against the exact bigKNN Euclidean baseline.

Usage

benchmark_annoy_recall_suite(
  x = NULL,
  query = NULL,
  n_ref = 2000L,
  n_query = 200L,
  n_dim = 20L,
  k = 10L,
  n_trees = c(10L, 50L, 100L),
  search_k = c(-1L, 1000L, 5000L),
  metric = "euclidean",
  seed = 42L,
  build_seed = seed,
  build_threads = -1L,
  block_size = annoy_default_block_size(),
  backend = getOption("bigANNOY.backend", "cpp"),
  exact = TRUE,
  filebacked = FALSE,
  path_dir = tempdir(),
  keep_files = FALSE,
  output_path = NULL,
  load_mode = "eager"
)

Arguments

x

Optional benchmark reference input. Supply NULL to generate a synthetic reference matrix, or provide a numeric matrix, big.matrix, descriptor, descriptor path, or external pointer.

query

Optional benchmark query input. Supply NULL for self-search, or provide a numeric matrix, big.matrix, descriptor, descriptor path, or external pointer.

n_ref

Number of synthetic reference rows to generate when x = NULL.

n_query

Number of synthetic query rows to generate when x = NULL and query is not NULL.

n_dim

Number of synthetic columns to generate when x = NULL.

k

Number of neighbours to return.

n_trees

Integer vector of Annoy tree counts to benchmark.

search_k

Integer vector of Annoy search budgets to benchmark.

metric

Annoy metric. One of "euclidean", "angular", "manhattan", or "dot".

seed

Random seed used for synthetic data generation and, by default, for the Annoy build seed.

build_seed

Optional Annoy build seed. Defaults to seed.

build_threads

Native Annoy build-thread setting.

block_size

Build/search block size.

backend

Requested bigANNOY backend.

exact

Logical flag controlling whether to benchmark the exact Euclidean baseline with bigKNN when available.

filebacked

Logical flag; if TRUE, synthetic or dense reference inputs are converted into file-backed big.matrix objects before build.

path_dir

Directory where temporary Annoy and optional file-backed benchmark files should be written.

keep_files

Logical flag; if TRUE, leave the generated Annoy index on disk after the benchmark finishes.

output_path

Optional CSV path where the benchmark summary should be written.

load_mode

Whether the benchmarked index should be returned metadata-only until first search ("lazy") or eagerly loaded once built ("eager").

Value

A list with a summary data frame containing one row per (n_trees, search_k) configuration.