A robust Python tool to bulk-download bibliometric records from the Elsevier Scopus API for any custom keyword or topic . Simply update search_config.json with your own search query, year range, and output filename. No code editing required.
Built for researchers who need large-scale, reproducible literature datasets from Scopus.
- Custom keyword search: Change the query in
search_config.json— search any topic, disease, drug, author, journal, or combination - Full coverage: Automatically splits large result sets by Source Type and Document Type to bypass Scopus's 5000-record offset cap
- Crash-safe: Saves progress year-by-year; re-running resumes from where it stopped
- Rate-limit aware: Exponential back-off on 429/5xx errors; live quota monitoring via response headers
- Deduplication: Final CSV is deduplicated by EID before saving
- Rich metadata: Captures title, authors, affiliations, keywords, citations, open-access flag, ISSN, publisher, and more
- Python 3.8+
- A valid Elsevier Scopus API key (free for institutional users)
Install dependencies:
pip install -r requirements.txtYou need an API key from Elsevier to use this tool. Follow these steps:
1. Check institutional access Your university or institution must have an active Scopus subscription. Most universities do. Confirm with your library if unsure.
2. Register on Elsevier Developer Portal
- Go to https://dev.elsevier.com/
- Click I want an API key
- Sign in or create a free Elsevier account
3. Create an application
- After login, go to My API Key or Create API Key
- Enter an application name (e.g.
Scopus Bibliometric Downloader) - Select Scopus Search API as the product
- Accept the terms and submit
4. Copy your API key
- Your key will appear on the dashboard (a long alphanumeric string)
- Copy it and paste it into your
.envfile:
SCOPUS_API_KEY=your_key_here
5. Activate institutional IP (if required)
- Some API keys only work from your institution's network or VPN
- If you get a
401or403error, connect to your institution's VPN and retry
Note: The free Scopus API key allows up to 20,000 requests per week. For large downloads exceeding this, contact Elsevier at intlsales@elsevier.com to request a quota increase.
Copy .env.example to .env and add your Scopus API key:
cp .env.example .envEdit .env:
SCOPUS_API_KEY=your_scopus_api_key_here
See the How to Get a Scopus API Key section above for full instructions.
Edit search_config.json to define what to download:
{
"base_query": "TITLE-ABS-KEY(\"your keyword here\")",
"year_start": 1990,
"year_end": 2026,
"output_csv": "my_results.csv"
}| Field | Description |
|---|---|
base_query |
Any valid Scopus query string (see examples below) |
year_start |
First year to fetch (inclusive) |
year_end |
Last year to fetch (inclusive) |
output_csv |
Output CSV filename |
python run_scopus.pyOr directly (no author integrity check):
python scopus_api.pyChange base_query in search_config.json to search any topic:
{ "base_query": "TITLE-ABS-KEY(\"metabolic syndrome\")" }{ "base_query": "TITLE-ABS-KEY(\"type 2 diabetes\")" }{ "base_query": "TITLE-ABS-KEY(\"hypertension\" AND \"cardiovascular\")" }{ "base_query": "TITLE-ABS-KEY(\"deep learning\" AND \"medical imaging\")" }{ "base_query": "TITLE-ABS-KEY(\"obesity\") AND SRCTYPE(j)" }{ "base_query": "AUTH(\"Smith J\") AND AFFILCOUNTRY(India)" }For full Scopus query syntax, see the Scopus Search Guide.
- Reads your query and year range from
search_config.json - Loads your API key from
.env - For each year, probes the total record count first (1 cheap request)
- If the year has ≤ 4600 records: fetches directly with offset pagination
- If the year has > 4600 records: automatically splits by Source Type, then Document Type, to retrieve every record
- Appends results to your CSV year-by-year (crash-safe)
- Deduplicates the final CSV by EID
To restart from scratch, delete scopus_progress.json and your output CSV.
A CSV file with the following columns:
| Column | Description |
|---|---|
EID |
Unique Scopus record ID |
Title |
Article title |
Authors |
First author name |
Author full names |
All author names |
Author(s) ID |
Scopus author IDs |
Year |
Publication year |
Source title |
Journal or conference name |
Cited by |
Citation count |
Link |
Scopus record URL |
Affiliations (basic) |
Institution names |
Authors with affiliations (basic) |
Authors + affiliations |
Author Keywords |
Keywords from the authors |
Index Keywords |
Scopus-indexed keywords |
Publisher |
Publisher name |
Language of Original Document |
Language |
Abbreviated Source Title |
ISSN / eISSN |
Document Type |
Article, Review, Conference Paper, etc. |
Publication Stage |
Journal, Book, Conference, etc. |
Open Access |
Open access flag |
Source |
Scopus source ID |
Scopus limits offset-based pagination to ~5000 records per query. This script detects when a year's total exceeds MAX_OFFSET and automatically re-queries with additional filters:
- First splits by Source Type (
j,p,b,k,r,d) - If still too large, splits further by Document Type (
ar,re,cp, …)
Since every record has exactly one Source Type and one Document Type, these splits are exhaustive no records are missed.
├── scopus_api.py # Core download engine
├── run_scopus.py # Integrity-protected launcher
├── search_config.json # Your search query and year range ← edit this
├── .env # Your API key (never committed)
├── .env.example # API key template
├── requirements.txt # Python dependencies
└── .gitignore # Excludes .env and output files
Robin Tomar
- GitHub: https://github.com/imrobintomar
- LinkedIn: https://in.linkedin.com/in/robintomar
This project is shared for academic and research purposes. Please respect Elsevier's API terms of service and your institution's data use agreement when using Scopus data.