Skip to content

docs: pip-first README + new docs/CONFIGURATION.md#208

Merged
HumanBean17 merged 2 commits into
masterfrom
chore/readme-pip-landing
May 23, 2026
Merged

docs: pip-first README + new docs/CONFIGURATION.md#208
HumanBean17 merged 2 commits into
masterfrom
chore/readme-pip-landing

Conversation

@HumanBean17
Copy link
Copy Markdown
Owner

Motivation

We just shipped java-codebase-rag to PyPI as v0.1.0 (PR #205). That makes the README the package's long_description — the first thing a newcomer sees on https://pypi.org/project/java-codebase-rag/ before they decide whether to pip install.

The old README was a 785-line operator manual. It led with "Install from source: git clone ... && pip install -e ." even though the package is now on PyPI, and required scrolling through six sections of env vars, graph layer internals, brownfield mode, and ignore-pattern semantics before reaching anything resembling a quickstart. Great as a reference once you're already onboarded; intimidating as a landing page.

What changes

README.md: 785 → 185 lines. Rewritten as a pip-first landing:

  1. Title + the three-paragraph intro (kept verbatim — it sets up correct expectations about the graph-first, deterministic philosophy)
  2. Install: pip install java-codebase-rag + Python 3.11+ + early-stage disclaimer
  3. 5-minute walkthrough: clone the repo (for the tests/bank-chat-system fixture), java-codebase-rag init, meta, python -m search_lancedb for vector search, then --graph-expand to prove Kuzu is wired
  4. Wire into MCP host: Claude Code one-liner using the new java-codebase-rag-mcp console script, plus Claude Desktop inline JSON
  5. Five-tool cheat sheet: search / find / describe / resolve / neighbors with required args only
  6. Configuration: 6-row pointer table into docs/CONFIGURATION.md §1–§5 + CODEBASE_REQUIREMENTS.md
  7. CLI cheat sheet (6 subcommand groups)
  8. Further reading (9 docs)
  9. Install from source demoted to the bottom — for contributors only
  10. Roadmap

docs/CONFIGURATION.md (new, 582 lines) absorbs the long-form material that used to live in README.md §2/§6/§7/§8:

  • §1 Environment variables
  • §2 Project YAML reference (annotated .java-codebase-rag.yml, path expansion, gotchas)
  • §3 Graph layer (ontology, MCP-traversable edges, call-graph notes, injection, chunk enrichment, module vs microservice, re-index callouts for ontology bumps 12→15, capabilities, ranking, debug context)
  • §4 Brownfield overrides (config role/route, cross-service resolution, source stubs, caller-side, limitations, Lance/Kuzu consistency)
  • §5 Ignore patterns

mcp.json.example: leads with "command": "java-codebase-rag-mcp" (the console script entry point from pyproject.toml) instead of the old /ABSOLUTE/PATH/.venv/bin/python server.py invocation.

Cross-ref sweep

Fixed every stale README anchor that survived the rewrite:

  • AGENTS.md — "where to look" bullets retargeted
  • CODEBASE_REQUIREMENTS.md — 4 refs (README §3a/b/c, "see README graph section") retargeted to docs/CONFIGURATION.md §3 / §4.3
  • docs/AGENT-GUIDE.md — maintenance re-index callout retargeted
  • docs/MANUAL-VERIFICATION-CHECKLIST.md — "see README CLI reference" → JAVA-CODEBASE-RAG-CLI.md

Historical refs in propose/completed/ and plans/completed/ are intentionally left alone (out of scope, snapshots of past state).

What this does NOT touch

  • No code changes (src/, server.py, CLI behaviour, build, tests — all untouched).
  • No pyproject.toml change. The next PyPI release will pick up the new long_description automatically.
  • One pre-existing stale ref in docs/skills/java-codebase-explore.md:208 (says ontology 14, actual is 15) is out of scope for this PR.

Verification

  • All 15 outbound links from the new README.md resolve.
  • All 4 sibling-doc links in docs/CONFIGURATION.md resolve.
  • 5-minute walkthrough commands fact-checked against the current CLI surface and pyproject.toml entry points (no fictional mcp subcommand; python -m search_lancedb works because search_lancedb is a top-level py-module shipped in the wheel).

Draft for now — happy to take review on framing, table shapes, and whether the 5-minute walkthrough hits the right depth.

The previous README (785 lines) doubled as the PyPI long_description AND
the full operator manual. With v0.1.0 on PyPI, newcomers landing on the
PyPI page or GitHub README first saw a 'cd /path; venv; pip install -r
requirements.txt' block that contradicted the 'pip install' they just
ran, then a wall of YAML, brownfield overrides, and graph ontology
before they had indexed anything.

This PR rewrites README.md as a pip-first landing page (~185 lines) and
moves reference content to docs/CONFIGURATION.md (~582 lines).

README.md (new structure):
- Intro (kept verbatim — sets correct GPS-not-reasoning-engine expectations)
- Install: 'pip install java-codebase-rag' as the headline
- 5-minute walkthrough on tests/bank-chat-system: init, meta,
  python -m search_lancedb (proves Lance), then --graph-expand (proves Kuzu)
- Wire to Claude Code / Claude Desktop using the new
  'java-codebase-rag-mcp' console script (no .venv/server.py paths)
- Five-tool cheat sheet (required args only — full schemas in AGENT-GUIDE)
- Configuration: 6-row pointer table into docs/CONFIGURATION.md sections
- CLI cheat sheet (subcommand groups, deeper detail in JAVA-CODEBASE-RAG-CLI.md)
- Further reading
- Install from source (contributors) — demoted from the headline
- Roadmap

docs/CONFIGURATION.md (new file, 582 lines):
- §1 Environment variables
- §2 Project YAML reference (full annotated example)
- §3 Graph layer (node kinds, edges, call-graph notes, injection,
  chunk enrichment, module vs microservice, re-index callouts for
  ontology 12-15, capabilities, ranking, context-debugging)
- §4 Brownfield overrides (config, cross-service resolution,
  source stubs, caller-side, limitations, Lance/Kuzu consistency)
- §5 Ignore patterns

Cross-reference fixes:
- mcp.json.example: lead with 'java-codebase-rag-mcp' console script
  (was '/ABSOLUTE/PATH/TO/.venv/bin/python server.py')
- AGENTS.md: 'where to look' bullet for README rewritten + new
  CONFIGURATION.md bullet
- CODEBASE_REQUIREMENTS.md: 4 'see README' / 'see README §3a/b/c'
  refs retargeted to docs/CONFIGURATION.md sections
- docs/AGENT-GUIDE.md: maintenance rules updated (re-index callout
  now lives in docs/CONFIGURATION.md §3, not README)
- docs/MANUAL-VERIFICATION-CHECKLIST.md: 'see README CLI reference'
  retargeted to docs/JAVA-CODEBASE-RAG-CLI.md

Content shrinkage: README 785 -> 185 lines (76% reduction). Total
content roughly preserved; CONFIGURATION.md inherits §2/§6/§7/§8 and
the YAML reference from §1.

No code changes. No test changes. PyPI long_description (readme =
'README.md' in pyproject.toml) automatically picks up the new
landing on next release.
@HumanBean17
Copy link
Copy Markdown
Owner Author

Review: pip-first README + CONFIGURATION.md

Clean docs refactor — content is faithfully migrated, cross-references are consistent, and the new README hits the right depth for a PyPI landing page. A few items worth confirming before merge:

Worth checking

  1. MCP v2 response extras paragraph is gone from both files. Old README §4 had a dense paragraph about hints, pagination echo, resolved_identifier, page-full hints, fuzzy-strategy hints, and pointers to HINTS-V3-PROPOSE.md / HINTS-ROAD-SIGNS-PROPOSE.md / HINTS-V2-PROPOSE.md. The new README delegates to AGENT-GUIDE.md, but the AGENT-GUIDE diff only changes the maintenance callout — it doesn't add this content. If the hints contract details aren't already in AGENT-GUIDE.md, this content is now orphaned.

  2. python -m search_lancedb in the walkthrough assumes the module ships in the wheel. The PR body asserts this works, but worth confirming search_lancedb is actually included in pyproject.toml [tool.setuptools.packages.find] — pip-only users will get an ImportError otherwise.

Minor nits (non-blocking)

  • "Install from source" drops the default embedding model note (sentence-transformers/all-MiniLM-L6-v2). Now only in CONFIGURATION.md §2. Low-risk since pip users get it implicitly, but source-build contributors may not know what model to expect.

  • No TOC in the new README. At 185 lines it's arguably skimmable without one, but a tiny TOC could help scroll-fatigued PyPI readers. Style choice.

Otherwise looks good — ship-ready after the two checks above.

@HumanBean17
Copy link
Copy Markdown
Owner Author

Thanks for the careful read. Checked all four items:

1. Hints contract — already in AGENT-GUIDE.md ✅

Not orphaned. docs/AGENT-GUIDE.md already carries the contract:

  • L19 — top-level hints list (≤5 next calls), echoed limit/offset on search/find, advisory-only / ignore-on-failure semantics.
  • L102neighbors pagination semantics (default limit=25, slicing merged flat + composed edge list).
  • L231 — specialized hint templates (TPL_NEIGHBORS_CALLS_HIGH_FANOUT, TPL_NEIGHBORS_CALLS_HAS_UNRESOLVED) and their single-origin-only firing rule.
  • L237 — full CALLS edge contract including include_unresolved, dedup_calls, edge_filter projection.

The old README §4 paragraph was a duplicate of what AGENT-GUIDE already owned. The propose-doc pointers (HINTS-V3-PROPOSE.md etc.) are findable from docs/AGENT-GUIDE.md and the propose/ directory listing — those are design-history references, not user-facing contract.

2. search_lancedb in the wheel ✅

Confirmed at pyproject.toml:65:

[tool.setuptools]
packages = ["java_codebase_rag"]
py-modules = [
    "ast_java",
    ...
    "search_lancedb",
    ...
]

python -m search_lancedb works for pip-only users.

3. Default embedding model note — agreed, will add

You're right that source-build contributors lose this info. Will add a one-liner under the source-install section pointing at EMBEDDING_MODEL env var + sentence-transformers/all-MiniLM-L6-v2 default.

4. TOC — passing for now

Disagree on this one. 185 lines with ## headers every ~20 lines is below the threshold where a TOC stops being noise and starts being navigation. GitHub renders a sticky outline on the right rail and PyPI renders headings inline — both give skimmers the same affordance a hand-written TOC would. Reconsidering only if the README grows past ~250 lines.

Will push the §3 fix as a fixup commit shortly.

Source-build contributors lost this implicit info when the long-form env-var
table moved into docs/CONFIGURATION.md. One-line pointer keeps them oriented
without re-inlining the full reference.

Addresses review feedback on #208.
@HumanBean17
Copy link
Copy Markdown
Owner Author

All four items resolved — LGTM, ready to merge.

  1. Hints contract — confirmed in AGENT-GUIDE.md (L19, L102, L231, L237). Not orphaned.
  2. search_lancedb — confirmed in pyproject.toml py-modules. Pip users are fine.
  3. Default embedding model — fixup commit added the one-liner under source-install.
  4. TOC — agreed to disagree, rationale is sound.

Ship it.

@HumanBean17 HumanBean17 marked this pull request as ready for review May 23, 2026 22:25
@HumanBean17 HumanBean17 merged commit c19e611 into master May 23, 2026
1 check passed
@HumanBean17 HumanBean17 deleted the chore/readme-pip-landing branch May 24, 2026 12:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant