feat: add kotlin by aeneasr · Pull Request #116 · ory/lumen

aeneasr · 2026-04-09T08:03:19Z

Summary

Adds Kotlin language support via tree-sitter chunker (.kt, .kts)
Kotlin-specific findEnclosingSymbol logic isolated to avoid regressions in Java/JavaScript/TypeScript
Benchmark task: kotlin-hard — ArrayIndexOutOfBoundsException in deep JSON parsing (issue #2994, JsonPath.resize())
Preflight probe fix: baseline check now detects actual tool_use events in stream-json rather than text mentions (session context was polluting the check)

Benchmark Results (haiku, Apr 13 — canonical run)

Scenario	Time	Cost	Output Tokens	Quality
baseline	350.1s	$0.5606	15,511	Good
with-lumen	231.4s	$0.3516	13,896	Perfect
delta	-34%	-37%	-10%	↑ Good → Perfect

3 semantic searches surfaced JsonPath.resize() precisely; baseline required 79 tool calls vs 34 with Lumen. Quality improvement confirmed on both haiku and Sonnet runs.

Test plan

go test ./... passes
E2E snapshots regenerated for class-qualified Kotlin symbols
Benchmark preflight passes all 3 probes (baseline, with-lumen, hook-firing)
Kotlin benchmark: Good → Perfect quality improvement confirmed

🤖 Generated with Claude Code

Add class_declaration, object_declaration, and companion_object to findEnclosingSymbol so methods inside classes get qualified as ClassName.methodName. This improves semantic search precision for Java, C#, TypeScript, JavaScript, PHP, and prepares for Kotlin support. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add tree-sitter based chunking for Kotlin using go-sitter-forest/kotlin. Supports functions (incl. suspend, inline, extension, operator), classes (data, sealed, abstract, annotation, value, fun interface), objects, companion objects, properties, type aliases, and enum entries. Closes #107 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Curate a Kotlin benchmark task from ktorio/ktor PR #2295 - a CORS header validation regression where only the first header in a preflight request was being checked due to an early return inside a lambda. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Update README.md language count (12 → 13) and add Kotlin row to the supported languages table. Add Kotlin repos to bench-swe source repos. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Baseline: $0.182, 95s. With Lumen: $0.169, 176s. Both found and fixed the ktor CORS header validation bug identically. Cost reduction is modest because the bug is small and well-localized. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Replace the simple ktor CORS task with a harder bug from kotlinx.serialization#2909: MissingFieldException when deserializing enum field in sealed hierarchy with coerceInputValues + explicitNulls. Fix spans 2 source files across platform-specific decoder internals. Grep score: 8%. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

kotlinx.serialization enum coercion bug: baseline $0.77/592s vs with-lumen $0.67/348s. Semantic search helps Claude navigate the decoder internals 41% faster with 17% fewer tokens. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Remove structured logging from command handlers and test utilities. This simplifies the initialization path and removes the logger field from indexerCache, which was only used for background diagnostics. - Remove newDebugLogger() calls from runIndex and runSearch - Remove logger parameter from setupIndexer and runIndexer - Remove log field from indexerCache struct - Remove log field initializations from all test fixtures - Update performIndexing to not log signal cancellation Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

- Add structured logging to failover embedder for diagnostics - Update E2E test snapshots for tree-sitter changes - Add Kotlin language support to chunker tests - Expand failover embedder test coverage Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

pterm's Fprinto uses \r without clearing to end-of-line, causing old render fragments to leak through when the progress string shrinks between updates. Its cursor.Hide/Show also writes to os.Stdout instead of the configured writer, leaving the cursor hidden after exit. Replace pterm's ProgressbarPrinter with a minimal custom renderer that uses \r\033[K for clean line updates and writes cursor escape sequences directly to the configured writer (stderr). Add defer RestoreCursor() safety nets in cmd/index.go to guarantee cursor restoration on all exit paths including errors and signals. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

runIndex is the interactive lumen index path (uses tui.Progress) — no slog logger is defined in that scope. The SetLogger call was accidentally left in when embedder logging was added for background/MCP paths only. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Benchmarks show ResolveContainment regresses TypeScript performance: with it, Lumen is +53% cost/+21% time; without it, -29% cost/-34% time. Removing parent type chunks destroys embedding context the model needs. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Class-qualified method symbols (ClassName.method) change chunk metadata for Java, TypeScript, JavaScript, Kotlin, and C#. Existing V2 indexes would have inconsistent symbol naming. Force clean re-index. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Updated 15 snapshot files across Java, TypeScript, and JavaScript E2E tests. Symbols now use ClassName.method format (IndexVersion 3). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Rerun the Kotlin benchmark with claude-sonnet-4-6 (Sonnet) to verify chunker and index pipeline correctness after the stale-index investigation. Results: -31% cost, -37% time, and the first quality improvement across all 10 languages (baseline Good → with-lumen Perfect). A single semantic search ("data object parsing JSON deserialization") surfaced JsonPath.resize() at 43 tokens; Claude produced a tighter patch and a more focused regression test than the baseline. Updates README.md and docs/BENCHMARKS.md with Kotlin row in all tables (Full Results, Quality Summary, and Key Findings cost/time/tokens/calls). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

The previous findEnclosingSymbol modification added class_declaration to the parent switch, which Java/JS/TS also use — causing 15 E2E snapshots to change and breaking test assertions across four languages. This commit reverts to the base behaviour and adds only Kotlin-safe parent types that are exclusive to Kotlin's tree-sitter grammar: - enum_class_body: body of Kotlin enum classes (Java uses class_body) - object_declaration: Kotlin top-level/nested object singletons - companion_object: Kotlin companion objects class_declaration is deliberately excluded — it is shared with Java/JS/TS and adding it would silently alter their chunk naming. Methods inside regular Kotlin classes (class_declaration) are therefore not class-qualified; this can be addressed in a follow-up. Reverts the 15 Java/JS/TS E2E snapshots to their pre-branch state. Reverts test assertions in treesitter_test.go and treesitter_adversarial_test.go for existing languages. Adds new Kotlin-specific tests (extension map, leading comments, comprehensive chunker) with expectations matching achievable behaviour. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

The baseline probe checked whether "semantic_search" appeared anywhere in Claude's text output. This was unreliable: the SessionStart hook injects "Call mcp__lumen__semantic_search first" into every session's system context, so Claude mentions the name in its response even when the MCP server is absent from the config. Fix: ask Claude to actually call the tool and use --output-format stream-json. Parse the JSON stream for a "tool_use" event with the tool name. A text mention in an assistant response is a different JSON event ("text") and is not counted. If the tool is not registered, Claude cannot produce a tool_use event for it. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…rmed Switch the Kotlin benchmark task from issue #2909 (MissingFieldException in sealed hierarchy) to issue #2994 (ArrayIndexOutOfBoundsException in deep JSON data object parsing). The new task targets JsonPath.resize() — a tighter, more deterministic bug that is harder to find by filename alone, making it a better signal for semantic search effectiveness. Update README.md and docs/BENCHMARKS.md with results from the canonical haiku run (Apr 13), replacing the earlier Sonnet rerun. Kotlin now matches all other tasks (claude-haiku-4-5 execution, Sonnet 4.6 judging), so the footnote is removed. New numbers: baseline: $0.561 350.1s 15,511 output tokens 79 tool calls with-lumen: $0.352 231.4s 13,896 output tokens 34 tool calls 3 searches Delta: -37.3% -33.9% -10.4% -57% Quality: Good → Perfect (confirmed across both Sonnet and haiku runs). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

aeneasr and others added 7 commits April 8, 2026 23:44

docs: add Kotlin to supported languages lists

56bd549

Update README.md language count (12 → 13) and add Kotlin row to the supported languages table. Add Kotlin repos to bench-swe source repos. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

aeneasr changed the title ~~Add kotlin~~ feat: add kotlin Apr 9, 2026

aeneasr and others added 12 commits April 9, 2026 10:21

chore: remove old Kotlin benchmark runs, keep latest results

7d47075

test: regenerate E2E snapshots for class-qualified symbols

d50b8cf

Updated 15 snapshot files across Java, TypeScript, and JavaScript E2E tests. Symbols now use ClassName.method format (IndexVersion 3). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add kotlin#116

feat: add kotlin#116
aeneasr wants to merge 19 commits intomainfrom
add-kotlin

aeneasr commented Apr 9, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

aeneasr commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Benchmark Results (haiku, Apr 13 — canonical run)

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

aeneasr commented Apr 9, 2026 •

edited

Loading