Scopus Bulk Data Downloader

A robust Python tool to bulk-download bibliometric records from the Elsevier Scopus API for any custom keyword or topic . Simply update search_config.json with your own search query, year range, and output filename. No code editing required.

Built for researchers who need large-scale, reproducible literature datasets from Scopus.

Features

Custom keyword search: Change the query in search_config.json — search any topic, disease, drug, author, journal, or combination
Full coverage: Automatically splits large result sets by Source Type and Document Type to bypass Scopus's 5000-record offset cap
Crash-safe: Saves progress year-by-year; re-running resumes from where it stopped
Rate-limit aware: Exponential back-off on 429/5xx errors; live quota monitoring via response headers
Deduplication: Final CSV is deduplicated by EID before saving
Rich metadata: Captures title, authors, affiliations, keywords, citations, open-access flag, ISSN, publisher, and more

Requirements

Python 3.8+
A valid Elsevier Scopus API key (free for institutional users)

Install dependencies:

pip install -r requirements.txt

How to Get a Scopus API Key

You need an API key from Elsevier to use this tool. Follow these steps:

1. Check institutional access Your university or institution must have an active Scopus subscription. Most universities do. Confirm with your library if unsure.

2. Register on Elsevier Developer Portal

Go to https://dev.elsevier.com/
Click I want an API key
Sign in or create a free Elsevier account

3. Create an application

After login, go to My API Key or Create API Key
Enter an application name (e.g. Scopus Bibliometric Downloader)
Select Scopus Search API as the product
Accept the terms and submit

4. Copy your API key

Your key will appear on the dashboard (a long alphanumeric string)
Copy it and paste it into your .env file:

SCOPUS_API_KEY=your_key_here

5. Activate institutional IP (if required)

Some API keys only work from your institution's network or VPN
If you get a 401 or 403 error, connect to your institution's VPN and retry

Note: The free Scopus API key allows up to 20,000 requests per week. For large downloads exceeding this, contact Elsevier at intlsales@elsevier.com to request a quota increase.

Quick Start

Step 1 — Set your API key

Copy .env.example to .env and add your Scopus API key:

cp .env.example .env

Edit .env:

SCOPUS_API_KEY=your_scopus_api_key_here

See the How to Get a Scopus API Key section above for full instructions.

Step 2 — Set your search query

Edit search_config.json to define what to download:

{
  "base_query": "TITLE-ABS-KEY(\"your keyword here\")",
  "year_start": 1990,
  "year_end": 2026,
  "output_csv": "my_results.csv"
}

Field	Description
`base_query`	Any valid Scopus query string (see examples below)
`year_start`	First year to fetch (inclusive)
`year_end`	Last year to fetch (inclusive)
`output_csv`	Output CSV filename

Step 3 — Run

python run_scopus.py

Or directly (no author integrity check):

python scopus_api.py

Custom Query Examples

Change base_query in search_config.json to search any topic:

{ "base_query": "TITLE-ABS-KEY(\"metabolic syndrome\")" }

{ "base_query": "TITLE-ABS-KEY(\"type 2 diabetes\")" }

{ "base_query": "TITLE-ABS-KEY(\"hypertension\" AND \"cardiovascular\")" }

{ "base_query": "TITLE-ABS-KEY(\"deep learning\" AND \"medical imaging\")" }

{ "base_query": "TITLE-ABS-KEY(\"obesity\") AND SRCTYPE(j)" }

{ "base_query": "AUTH(\"Smith J\") AND AFFILCOUNTRY(India)" }

For full Scopus query syntax, see the Scopus Search Guide.

What the Script Does

Reads your query and year range from search_config.json
Loads your API key from .env
For each year, probes the total record count first (1 cheap request)
If the year has ≤ 4600 records: fetches directly with offset pagination
If the year has > 4600 records: automatically splits by Source Type, then Document Type, to retrieve every record
Appends results to your CSV year-by-year (crash-safe)
Deduplicates the final CSV by EID

To restart from scratch, delete scopus_progress.json and your output CSV.

Output

A CSV file with the following columns:

Column	Description
`EID`	Unique Scopus record ID
`Title`	Article title
`Authors`	First author name
`Author full names`	All author names
`Author(s) ID`	Scopus author IDs
`Year`	Publication year
`Source title`	Journal or conference name
`Cited by`	Citation count
`Link`	Scopus record URL
`Affiliations (basic)`	Institution names
`Authors with affiliations (basic)`	Authors + affiliations
`Author Keywords`	Keywords from the authors
`Index Keywords`	Scopus-indexed keywords
`Publisher`	Publisher name
`Language of Original Document`	Language
`Abbreviated Source Title`	ISSN / eISSN
`Document Type`	Article, Review, Conference Paper, etc.
`Publication Stage`	Journal, Book, Conference, etc.
`Open Access`	Open access flag
`Source`	Scopus source ID

How the Offset-Cap Workaround Works

Scopus limits offset-based pagination to ~5000 records per query. This script detects when a year's total exceeds MAX_OFFSET and automatically re-queries with additional filters:

First splits by Source Type (j, p, b, k, r, d)
If still too large, splits further by Document Type (ar, re, cp, …)

Since every record has exactly one Source Type and one Document Type, these splits are exhaustive no records are missed.

File Structure

├── scopus_api.py          # Core download engine
├── run_scopus.py          # Integrity-protected launcher
├── search_config.json     # Your search query and year range ← edit this
├── .env                   # Your API key (never committed)
├── .env.example           # API key template
├── requirements.txt       # Python dependencies
└── .gitignore             # Excludes .env and output files

Author

Robin Tomar

GitHub: https://github.com/imrobintomar
LinkedIn: https://in.linkedin.com/in/robintomar

License

This project is shared for academic and research purposes. Please respect Elsevier's API terms of service and your institution's data use agreement when using Scopus data.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Scopus Bulk Data Downloader

Features

Requirements

How to Get a Scopus API Key

Quick Start

Step 1 — Set your API key

Step 2 — Set your search query

Step 3 — Run

Custom Query Examples

What the Script Does

Output

How the Offset-Cap Workaround Works

File Structure

Author

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
run_scopus.py		run_scopus.py
scopus_api.py		scopus_api.py
search_config.json		search_config.json

Folders and files

Latest commit

History

Repository files navigation

Scopus Bulk Data Downloader

Features

Requirements

How to Get a Scopus API Key

Quick Start

Step 1 — Set your API key

Step 2 — Set your search query

Step 3 — Run

Custom Query Examples

What the Script Does

Output

How the Offset-Cap Workaround Works

File Structure

Author

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages