Skip to content

imrobintomar/ScopusAPI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Scopus Bulk Data Downloader

A robust Python tool to bulk-download bibliometric records from the Elsevier Scopus API for any custom keyword or topic . Simply update search_config.json with your own search query, year range, and output filename. No code editing required.

Built for researchers who need large-scale, reproducible literature datasets from Scopus.


Features

  • Custom keyword search: Change the query in search_config.json — search any topic, disease, drug, author, journal, or combination
  • Full coverage: Automatically splits large result sets by Source Type and Document Type to bypass Scopus's 5000-record offset cap
  • Crash-safe: Saves progress year-by-year; re-running resumes from where it stopped
  • Rate-limit aware: Exponential back-off on 429/5xx errors; live quota monitoring via response headers
  • Deduplication: Final CSV is deduplicated by EID before saving
  • Rich metadata: Captures title, authors, affiliations, keywords, citations, open-access flag, ISSN, publisher, and more

Requirements

  • Python 3.8+
  • A valid Elsevier Scopus API key (free for institutional users)

Install dependencies:

pip install -r requirements.txt

How to Get a Scopus API Key

You need an API key from Elsevier to use this tool. Follow these steps:

1. Check institutional access Your university or institution must have an active Scopus subscription. Most universities do. Confirm with your library if unsure.

2. Register on Elsevier Developer Portal

3. Create an application

  • After login, go to My API Key or Create API Key
  • Enter an application name (e.g. Scopus Bibliometric Downloader)
  • Select Scopus Search API as the product
  • Accept the terms and submit

4. Copy your API key

  • Your key will appear on the dashboard (a long alphanumeric string)
  • Copy it and paste it into your .env file:
SCOPUS_API_KEY=your_key_here

5. Activate institutional IP (if required)

  • Some API keys only work from your institution's network or VPN
  • If you get a 401 or 403 error, connect to your institution's VPN and retry

Note: The free Scopus API key allows up to 20,000 requests per week. For large downloads exceeding this, contact Elsevier at intlsales@elsevier.com to request a quota increase.


Quick Start

Step 1 — Set your API key

Copy .env.example to .env and add your Scopus API key:

cp .env.example .env

Edit .env:

SCOPUS_API_KEY=your_scopus_api_key_here

See the How to Get a Scopus API Key section above for full instructions.


Step 2 — Set your search query

Edit search_config.json to define what to download:

{
  "base_query": "TITLE-ABS-KEY(\"your keyword here\")",
  "year_start": 1990,
  "year_end": 2026,
  "output_csv": "my_results.csv"
}
Field Description
base_query Any valid Scopus query string (see examples below)
year_start First year to fetch (inclusive)
year_end Last year to fetch (inclusive)
output_csv Output CSV filename

Step 3 — Run

python run_scopus.py

Or directly (no author integrity check):

python scopus_api.py

Custom Query Examples

Change base_query in search_config.json to search any topic:

{ "base_query": "TITLE-ABS-KEY(\"metabolic syndrome\")" }
{ "base_query": "TITLE-ABS-KEY(\"type 2 diabetes\")" }
{ "base_query": "TITLE-ABS-KEY(\"hypertension\" AND \"cardiovascular\")" }
{ "base_query": "TITLE-ABS-KEY(\"deep learning\" AND \"medical imaging\")" }
{ "base_query": "TITLE-ABS-KEY(\"obesity\") AND SRCTYPE(j)" }
{ "base_query": "AUTH(\"Smith J\") AND AFFILCOUNTRY(India)" }

For full Scopus query syntax, see the Scopus Search Guide.


What the Script Does

  1. Reads your query and year range from search_config.json
  2. Loads your API key from .env
  3. For each year, probes the total record count first (1 cheap request)
  4. If the year has ≤ 4600 records: fetches directly with offset pagination
  5. If the year has > 4600 records: automatically splits by Source Type, then Document Type, to retrieve every record
  6. Appends results to your CSV year-by-year (crash-safe)
  7. Deduplicates the final CSV by EID

To restart from scratch, delete scopus_progress.json and your output CSV.


Output

A CSV file with the following columns:

Column Description
EID Unique Scopus record ID
Title Article title
Authors First author name
Author full names All author names
Author(s) ID Scopus author IDs
Year Publication year
Source title Journal or conference name
Cited by Citation count
Link Scopus record URL
Affiliations (basic) Institution names
Authors with affiliations (basic) Authors + affiliations
Author Keywords Keywords from the authors
Index Keywords Scopus-indexed keywords
Publisher Publisher name
Language of Original Document Language
Abbreviated Source Title ISSN / eISSN
Document Type Article, Review, Conference Paper, etc.
Publication Stage Journal, Book, Conference, etc.
Open Access Open access flag
Source Scopus source ID

How the Offset-Cap Workaround Works

Scopus limits offset-based pagination to ~5000 records per query. This script detects when a year's total exceeds MAX_OFFSET and automatically re-queries with additional filters:

  1. First splits by Source Type (j, p, b, k, r, d)
  2. If still too large, splits further by Document Type (ar, re, cp, …)

Since every record has exactly one Source Type and one Document Type, these splits are exhaustive no records are missed.


File Structure

├── scopus_api.py          # Core download engine
├── run_scopus.py          # Integrity-protected launcher
├── search_config.json     # Your search query and year range ← edit this
├── .env                   # Your API key (never committed)
├── .env.example           # API key template
├── requirements.txt       # Python dependencies
└── .gitignore             # Excludes .env and output files

Author

Robin Tomar


License

This project is shared for academic and research purposes. Please respect Elsevier's API terms of service and your institution's data use agreement when using Scopus data.

About

A robust Python tool to bulk-download bibliometric records from the Elsevier Scopus.Built for researchers who need large-scale, reproducible literature datasets from Scopus.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages