Knowledge Mapper

An interactive visualization that maps your conceptual knowledge across 250,000 Wikipedia articles and 5,000+ Khan Academy videos. Answer questions to watch a real-time heatmap of your strengths and gaps emerge, then get personalized video recommendations to fill knowledge gaps.

Try the live demo | Read the paper

How It Works

Choose a domain (e.g., Physics, Neuroscience, Computer Science) from the landing page
Answer adaptive questions -- each one tests conceptual understanding, terminology, and reasoning
Watch the map update -- correct answers turn nearby regions green; wrong answers reveal red gaps
Get video recommendations -- Khan Academy videos are suggested based on your weakest areas
Explore freely -- zoom, pan, hover video trajectories, and click articles for Wikipedia content

Under the hood, text embedding models place every article, question, and video transcript into a shared high-dimensional vector space, then project them onto a 2D map where related concepts cluster together. Density flattening via optimal transport ensures even spatial coverage. As you answer questions, a Bayesian estimator interpolates your knowledge across the map using radial basis functions.

Features

50 knowledge domains including Physics, Biology, Mathematics, Computer Science, Philosophy, and more
2,450 adaptive quiz questions generated via GPT-5-nano from Wikipedia source articles
5,000+ Khan Academy videos with knowledge-gap-based recommendations
Real-time heatmap powered by radial basis function interpolation
Video trajectories -- hover a video dot to see its topic path across the map
Knowledge insights -- see your strongest/weakest concepts and learning suggestions
Social sharing -- export your knowledge map as an image with grid lines and colorbar
Fully client-side -- no data leaves your browser; progress saved to localStorage

Quick Start

git clone https://github.com/ContextLab/mapper.git
cd mapper
npm install
npm run dev

Open http://localhost:5173/mapper/ in your browser.

Production Build

npm run build   # outputs to dist/
npm run preview # preview the production build locally

Project Structure

mapper/
├── index.html          # HTML entry point (layout, styles, modals)
├── src/                # Application source code
│   ├── app.js          # Entry point: init, routing, event wiring
│   ├── domain/         # Domain data loading and registry
│   ├── learning/       # Adaptive quiz engine + video recommender
│   ├── state/          # Application state and persistence
│   ├── ui/             # UI components (controls, quiz, insights, share, video modal)
│   ├── utils/          # Math, accessibility, feature detection
│   └── viz/            # Canvas rendering (heatmap, minimap, particles)
├── data/               # Pre-computed data bundles
│   ├── domains/        # 50 per-domain JSON bundles + index.json
│   └── videos/         # Video catalog + transcripts + embeddings
├── scripts/            # Python data pipeline
├── tests/              # Unit tests (vitest) + E2E tests (Playwright)
└── public/             # Static assets

Data Pipeline

The scripts/ directory contains the Python pipeline that generates the data powering the frontend:

Embed articles using google/embeddinggemma-300m (768-dim vectors)
Generate questions via GPT-5-nano (50 per domain, 2,450 total)
Embed questions using the same model (for coordinate consistency)
Transcribe videos via Whisper on GPU cluster (5,400+ Khan Academy transcripts)
Embed transcripts -- both full-document and sliding-window (512 words, 50-word stride)
Joint UMAP projection -- project articles + questions + transcripts TOGETHER to 2D
Density flattening via approximate optimal transport (mu=0.85)
Apply coordinates to all domain bundles and video catalog
Compute bounding boxes from question positions (5th-95th percentile)

Testing

npx vitest run        # 75 unit tests (estimator, sampler, recommender)
npx playwright test   # 8 E2E test specs (quiz flow, video recs, sharing)

Citation

@article{manning2025mapper,
  title={Text embedding models yield high-resolution insights into conceptual knowledge},
  author={Manning, Jeremy R},
  year={2025},
  url={https://psyarxiv.com/dh3q2}
}

License

CC BY-NC-SA 4.0 -- Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International

Contributing

This is a research project from the Contextual Dynamics Lab at Dartmouth College. For questions or collaboration inquiries, please open an issue.

Name		Name	Last commit message	Last commit date
Latest commit History 315 Commits
.claude/skills/generate-questions		.claude/skills/generate-questions
.github/workflows		.github/workflows
data		data
notes		notes
public		public
scripts		scripts
specs		specs
src		src
tests		tests
.gitignore		.gitignore
AGENTS.md		AGENTS.md
LICENSE		LICENSE
README.md		README.md
index.html		index.html
package-lock.json		package-lock.json
package.json		package.json
playwright.config.js		playwright.config.js
vite.config.js		vite.config.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Knowledge Mapper

How It Works

Features

Quick Start

Production Build

Project Structure

Data Pipeline

Testing

Citation

License

Contributing

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

ContextLab/mapper

Folders and files

Latest commit

History

Repository files navigation

Knowledge Mapper

How It Works

Features

Quick Start

Production Build

Project Structure

Data Pipeline

Testing

Citation

License

Contributing

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages