An interactive visualization that maps your conceptual knowledge across 250,000 Wikipedia articles and 5,000+ Khan Academy videos. Answer questions to watch a real-time heatmap of your strengths and gaps emerge, then get personalized video recommendations to fill knowledge gaps.
Try the live demo | Read the paper
- Choose a domain (e.g., Physics, Neuroscience, Computer Science) from the landing page
- Answer adaptive questions -- each one tests conceptual understanding, terminology, and reasoning
- Watch the map update -- correct answers turn nearby regions green; wrong answers reveal red gaps
- Get video recommendations -- Khan Academy videos are suggested based on your weakest areas
- Explore freely -- zoom, pan, hover video trajectories, and click articles for Wikipedia content
Under the hood, text embedding models place every article, question, and video transcript into a shared high-dimensional vector space, then project them onto a 2D map where related concepts cluster together. Density flattening via optimal transport ensures even spatial coverage. As you answer questions, a Bayesian estimator interpolates your knowledge across the map using radial basis functions.
- 50 knowledge domains including Physics, Biology, Mathematics, Computer Science, Philosophy, and more
- 2,450 adaptive quiz questions generated via GPT-5-nano from Wikipedia source articles
- 5,000+ Khan Academy videos with knowledge-gap-based recommendations
- Real-time heatmap powered by radial basis function interpolation
- Video trajectories -- hover a video dot to see its topic path across the map
- Knowledge insights -- see your strongest/weakest concepts and learning suggestions
- Social sharing -- export your knowledge map as an image with grid lines and colorbar
- Fully client-side -- no data leaves your browser; progress saved to localStorage
git clone https://github.com/ContextLab/mapper.git
cd mapper
npm install
npm run devOpen http://localhost:5173/mapper/ in your browser.
npm run build # outputs to dist/
npm run preview # preview the production build locallymapper/
├── index.html # HTML entry point (layout, styles, modals)
├── src/ # Application source code
│ ├── app.js # Entry point: init, routing, event wiring
│ ├── domain/ # Domain data loading and registry
│ ├── learning/ # Adaptive quiz engine + video recommender
│ ├── state/ # Application state and persistence
│ ├── ui/ # UI components (controls, quiz, insights, share, video modal)
│ ├── utils/ # Math, accessibility, feature detection
│ └── viz/ # Canvas rendering (heatmap, minimap, particles)
├── data/ # Pre-computed data bundles
│ ├── domains/ # 50 per-domain JSON bundles + index.json
│ └── videos/ # Video catalog + transcripts + embeddings
├── scripts/ # Python data pipeline
├── tests/ # Unit tests (vitest) + E2E tests (Playwright)
└── public/ # Static assets
The scripts/ directory contains the Python pipeline that generates the data powering the frontend:
- Embed articles using
google/embeddinggemma-300m(768-dim vectors) - Generate questions via GPT-5-nano (50 per domain, 2,450 total)
- Embed questions using the same model (for coordinate consistency)
- Transcribe videos via Whisper on GPU cluster (5,400+ Khan Academy transcripts)
- Embed transcripts -- both full-document and sliding-window (512 words, 50-word stride)
- Joint UMAP projection -- project articles + questions + transcripts TOGETHER to 2D
- Density flattening via approximate optimal transport (
mu=0.85) - Apply coordinates to all domain bundles and video catalog
- Compute bounding boxes from question positions (5th-95th percentile)
npx vitest run # 75 unit tests (estimator, sampler, recommender)
npx playwright test # 8 E2E test specs (quiz flow, video recs, sharing)@article{manning2025mapper,
title={Text embedding models yield high-resolution insights into conceptual knowledge},
author={Manning, Jeremy R},
year={2025},
url={https://psyarxiv.com/dh3q2}
}CC BY-NC-SA 4.0 -- Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International
This is a research project from the Contextual Dynamics Lab at Dartmouth College. For questions or collaboration inquiries, please open an issue.