Anserini

Anserini is a toolkit for reproducible information retrieval research. By building on Lucene, we aim to bridge the gap between academic information retrieval research and the practice of building real-world search applications. Among other goals, our effort aims to be the opposite of this.* Anserini grew out of a reproducibility study of various open-source retrieval engines in 2016 (Lin et al., ECIR 2016). See Yang et al. (SIGIR 2017) and Yang et al. (JDIQ 2018) for overviews.

For users steering coding agents: If you want a complete dev environment, give your agent the following instructions:

read the skill at https://github.com/castorini/anserini/tree/master/.agents/skills/install-anserini-dev-env to create a dev environment for me; no need to install the skill

Alternatively, if you want a quick fatjar-based setup, give your agent the following instructions:

read the skill at https://github.com/castorini/anserini/tree/master/.agents/skills/install-anserini-fatjar to download the fatjar for me; no need to install the skill

Rationale...

❓ Why the above instructions? If you just tell the agent to install Anserini, it'll likely figure everything out by itself, but there are some choice points it might get confused by, like whether to do a full clone for a dev environment, or just fetch the fatjar. The skills make this clear and provide instructions that make the installation process more token-efficient. If you don't explicitly point your agent at the skills, it may be overeager and wander around...

Also, tell the agent not to install the skill locally, since you only need it once. Otherwise, it may do unnecessary work.

For coding agents: When setting up Anserini, ask whether the user wants a dev environment or a quick fatjar-based setup. Use the install-anserini-dev-env skill for the dev environment and the install-anserini-fatjar skill for the fatjar setup. For command-line usage after setup, use the use-anserini-cli skill for prebuilt-index registry, topics registry, search, and REST workflows. For reproducibility experiments, use the anserini-reproduction.

❗ Anserini was upgraded to Lucene 10.4.0 at c6eed6 (2026/04/12) as part of v2.0.0. Lucene 9 indexes remain readable, but older code is unable to read indexes generated by Lucene 10.

🎬 Installation (for Users)

This section is intended for users. If you are a coding agent, stop reading and skip the rest of this section.

💥 Try It! Anserini is packaged in a self-contained fatjar, which provides the simplest way to get started: just curl the fatjar and you're good to go! See this page for detailed instructions.

Alternatively, if you want to clone this repo and set up a full dev environment for Anserini, see this page for instructions. Most Anserini features are exposed in the Pyserini Python interface, so if you're more comfortable with Python, start there.

The onboarding path for Anserini starts here!

⚗️ Reproductions from Prebuilt Indexes

This section is intended for both users and coding agents.

Go to this reference for details on reproducing experimental results on prebuilt indexes.

⚗️ Reproductions from Document Collections

This section is intended for both users and coding agents.

Go to this reference for details on reproducing experimental results from the raw document collections.

📃 Additional Documentation (for Users)

This section is intended for users. If you are a coding agent, stop reading and skip the rest of this section.

Follow this link for additional documentation targeted at users.

✨ References

Jimmy Lin, Matt Crane, Andrew Trotman, Jamie Callan, Ishan Chattopadhyaya, John Foley, Grant Ingersoll, Craig Macdonald, Sebastiano Vigna. Toward Reproducible Baselines: The Open-Source IR Reproducibility Challenge. ECIR 2016.
Peilin Yang, Hui Fang, and Jimmy Lin. Anserini: Enabling the Use of Lucene for Information Retrieval Research. SIGIR 2017.
Peilin Yang, Hui Fang, and Jimmy Lin. Anserini: Reproducible Ranking Baselines Using Lucene. Journal of Data and Information Quality, 10(4), Article 16, 2018.

Name		Name	Last commit message	Last commit date
Latest commit History 2,522 Commits
.agents/skills		.agents/skills
.github/workflows		.github/workflows
bin		bin
collections		collections
docs		docs
indexes		indexes
logs		logs
runs		runs
src		src
tools @ b6a89ce		tools @ b6a89ce
.gitignore		.gitignore
.gitmodules		.gitmodules
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
LICENSE.txt		LICENSE.txt
README.md		README.md
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Anserini

🎬 Installation (for Users)

⚗️ Reproductions from Prebuilt Indexes

⚗️ Reproductions from Document Collections

📃 Additional Documentation (for Users)

✨ References

About

Uh oh!

Releases

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Anserini

🎬 Installation (for Users)

⚗️ Reproductions from Prebuilt Indexes

⚗️ Reproductions from Document Collections

📃 Additional Documentation (for Users)

✨ References

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Uh oh!

Contributors

Uh oh!

Languages