An agent skill that studies any repository and produces structured knowledge artifacts. Drop it into Claude Code, Cursor, OpenCode, or any agent that supports the agentskills spec, point it at a codebase, and get back documentation that actually helps.
Most agents forget what they read three files ago. This skill fixes that by following a four-phase process:
- Reconnaissance -- scan the repo structure, identify the tech stack, map module boundaries
- Deep-dive study -- trace happy paths, error paths, and edge cases through each subsystem
- Artifact authoring -- fill a structured template covering architecture, key functions, gotchas, and Mermaid diagrams
- Delivery -- hand back self-contained Markdown artifacts that any developer (or agent) can read cold
The output is a set of knowledge artifacts. Each one covers a single subsystem and stands on its own. No prior context needed.
- Onboarding onto an unfamiliar codebase
- Producing documentation for a repo that has none
- Preparing knowledge files so other agents can work on the project without re-reading everything
- Studying a specific subsystem (auth, database layer, API routing, etc.) in depth
npx skills add OthmanAdi/codebase-knowledge-builderOr manually: copy the skills/codebase-knowledge-builder/ directory into your agent's skills folder.
skills/codebase-knowledge-builder/
SKILL.md # Skill definition and workflow
references/
recon-checklist.md # Phase 1 checklist
deep-dive-methodology.md # File reading and tracing strategies
templates/
knowledge_artifact.md # Output template for each subsystem
The SKILL.md stays lean (~80 lines). Detailed methodology lives in references/ and only gets loaded when needed. The template in templates/ defines the exact structure of every knowledge artifact the skill produces.
After running the skill on a Node.js API, each artifact includes:
- Architecture overview with design pattern identification
- Key components table (component, file path, responsibility)
- Step-by-step data and control flow
- Key functions table with parameters and return values
- Configuration and environment variable mapping
- Gotchas and pitfalls (race conditions, caching quirks, historical fixes)
- Extension points for adding new functionality
- Mermaid diagrams for visual flow
The skill uses progressive disclosure. When an agent triggers it, only the SKILL.md body loads into context (~600 words). The references and template load on demand during each phase. This keeps the context window clean for the actual codebase files being studied.
Scratch files (recon_findings.md, per-file notes) are saved during study so the agent doesn't lose findings as it reads more files. The quality checklist at the end catches incomplete sections, missing diagrams, and placeholder text before delivery.
See CONTRIBUTING.md for guidelines on submitting issues and pull requests.