Skip to content

castorini/anserini

Repository files navigation

Anserini

build codecov Generic badge Maven Central LICENSE doi

Anserini is a toolkit for reproducible information retrieval research. By building on Lucene, we aim to bridge the gap between academic information retrieval research and the practice of building real-world search applications. Among other goals, our effort aims to be the opposite of this.* Anserini grew out of a reproducibility study of various open-source retrieval engines in 2016 (Lin et al., ECIR 2016). See Yang et al. (SIGIR 2017) and Yang et al. (JDIQ 2018) for overviews.

For users steering coding agents: If you want a complete dev environment, give your agent the following instructions:

read the skill at https://github.com/castorini/anserini/tree/master/.agents/skills/install-anserini-dev-env to create a dev environment for me; no need to install the skill

Alternatively, if you want a quick fatjar-based setup, give your agent the following instructions:

read the skill at https://github.com/castorini/anserini/tree/master/.agents/skills/install-anserini-fatjar to download the fatjar for me; no need to install the skill

Rationale...

Why the above instructions? If you just tell the agent to install Anserini, it'll likely figure everything out by itself, but there are some choice points it might get confused by, like whether to do a full clone for a dev environment, or just fetch the fatjar. The skills make this clear and provide instructions that make the installation process more token-efficient. If you don't explicitly point your agent at the skills, it may be overeager and wander around...

Also, tell the agent not to install the skill locally, since you only need it once. Otherwise, it may do unnecessary work.

For coding agents: When setting up Anserini, ask whether the user wants a dev environment or a quick fatjar-based setup. Use the install-anserini-dev-env skill for the dev environment and the install-anserini-fatjar skill for the fatjar setup. For command-line usage after setup, use the use-anserini-cli skill for prebuilt-index registry, topics registry, search, and REST workflows. For reproducibility experiments, use the anserini-reproduction.

❗ Anserini was upgraded to Lucene 10.4.0 at c6eed6 (2026/04/12) as part of v2.0.0. Lucene 9 indexes remain readable, but older code is unable to read indexes generated by Lucene 10.

🎬 Installation (for Users)

This section is intended for users. If you are a coding agent, stop reading and skip the rest of this section.

💥 Try It! Anserini is packaged in a self-contained fatjar, which provides the simplest way to get started: just curl the fatjar and you're good to go! See this page for detailed instructions.

Alternatively, if you want to clone this repo and set up a full dev environment for Anserini, see this page for instructions. Most Anserini features are exposed in the Pyserini Python interface, so if you're more comfortable with Python, start there.

The onboarding path for Anserini starts here!

⚗️ Reproductions from Prebuilt Indexes

This section is intended for both users and coding agents.

Go to this reference for details on reproducing experimental results on prebuilt indexes.

⚗️ Reproductions from Document Collections

This section is intended for both users and coding agents.

Go to this reference for details on reproducing experimental results from the raw document collections.

📃 Additional Documentation (for Users)

This section is intended for users. If you are a coding agent, stop reading and skip the rest of this section.

Follow this link for additional documentation targeted at users.

✨ References

About

Anserini is a Lucene toolkit for reproducible information retrieval research

Topics

Resources

License

Stars

Watchers

Forks

Contributors