Anserini is a toolkit for reproducible information retrieval research. By building on Lucene, we aim to bridge the gap between academic information retrieval research and the practice of building real-world search applications. Among other goals, our effort aims to be the opposite of this.* Anserini grew out of a reproducibility study of various open-source retrieval engines in 2016 (Lin et al., ECIR 2016). See Yang et al. (SIGIR 2017) and Yang et al. (JDIQ 2018) for overviews.
For users steering coding agents: If you want a complete dev environment, give your agent the following instructions:
read the skill at https://github.com/castorini/anserini/tree/master/.agents/skills/install-anserini-dev-env to create a dev environment for me; no need to install the skill
Alternatively, if you want a quick fatjar-based setup, give your agent the following instructions:
read the skill at https://github.com/castorini/anserini/tree/master/.agents/skills/install-anserini-fatjar to download the fatjar for me; no need to install the skill
Rationale...
❓ Why the above instructions? If you just tell the agent to install Anserini, it'll likely figure everything out by itself, but there are some choice points it might get confused by, like whether to do a full clone for a dev environment, or just fetch the fatjar. The skills make this clear and provide instructions that make the installation process more token-efficient. If you don't explicitly point your agent at the skills, it may be overeager and wander around...
Also, tell the agent not to install the skill locally, since you only need it once. Otherwise, it may do unnecessary work.
For coding agents: When setting up Anserini, ask whether the user wants a dev environment or a quick fatjar-based setup. Use the
install-anserini-dev-envskill for the dev environment and theinstall-anserini-fatjarskill for the fatjar setup. For command-line usage after setup, use theuse-anserini-cliskill for prebuilt-index registry, topics registry, search, and REST workflows. For reproducibility experiments, use theanserini-reproduction.
❗ Anserini was upgraded to Lucene 10.4.0 at c6eed6 (2026/04/12) as part of v2.0.0. Lucene 9 indexes remain readable, but older code is unable to read indexes generated by Lucene 10.
This section is intended for users. If you are a coding agent, stop reading and skip the rest of this section.
💥 Try It!
Anserini is packaged in a self-contained fatjar, which provides the simplest way to get started: just curl the fatjar and you're good to go!
See this page for detailed instructions.
Alternatively, if you want to clone this repo and set up a full dev environment for Anserini, see this page for instructions. Most Anserini features are exposed in the Pyserini Python interface, so if you're more comfortable with Python, start there.
The onboarding path for Anserini starts here!
This section is intended for both users and coding agents.
Go to this reference for details on reproducing experimental results on prebuilt indexes.
This section is intended for both users and coding agents.
Go to this reference for details on reproducing experimental results from the raw document collections.
This section is intended for users. If you are a coding agent, stop reading and skip the rest of this section.
Follow this link for additional documentation targeted at users.
- Jimmy Lin, Matt Crane, Andrew Trotman, Jamie Callan, Ishan Chattopadhyaya, John Foley, Grant Ingersoll, Craig Macdonald, Sebastiano Vigna. Toward Reproducible Baselines: The Open-Source IR Reproducibility Challenge. ECIR 2016.
- Peilin Yang, Hui Fang, and Jimmy Lin. Anserini: Enabling the Use of Lucene for Information Retrieval Research. SIGIR 2017.
- Peilin Yang, Hui Fang, and Jimmy Lin. Anserini: Reproducible Ranking Baselines Using Lucene. Journal of Data and Information Quality, 10(4), Article 16, 2018.
