Skip to content

Commit ca69e49

Browse files
authored
Merge pull request #3 from flatmax/dev3
Dev3
2 parents f978864 + 624080b commit ca69e49

33 files changed

Lines changed: 2097 additions & 1002 deletions

README.md

Lines changed: 1 addition & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -2,12 +2,6 @@
22

33
AC⚡DC is an AI pair-programming tool that runs as a terminal application with a browser-based UI. It helps developers navigate codebases, chat with LLMs, and apply structured file edits — all with intelligent prompt caching to minimize costs.
44

5-
https://github.com/user-attachments/assets/ece86b13-1d6f-4b1e-a029-f358c50ff858
6-
7-
<details><summary>Slow version</summary>
8-
9-
https://github.com/user-attachments/assets/63e442cf-6d3a-4cbc-a96d-20fe8c4964c8
10-
115
</details>
126

137
## Features
@@ -25,7 +19,7 @@ https://github.com/user-attachments/assets/63e442cf-6d3a-4cbc-a96d-20fe8c4964c8
2519
- **Voice dictation** via Web Speech API.
2620
- **Math rendering** — LaTeX expressions in LLM responses render as formatted math via KaTeX (`$$...$$` for display blocks, `$...$` for inline).
2721
- **Configurable prompt snippets** for common actions.
28-
- **Full-text search** across the repo with regex, whole-word, and case-insensitive modes.
22+
- **Full-text search** with a two-panel layout — file picker (left) showing matching files with match counts, and a match context panel (right) with highlighted results and bidirectional scroll sync. Supports regex, whole-word, and case-insensitive modes.
2923
- **Session history browser** — search, revisit, and reload past conversations.
3024
- **2D file navigation grid** — open files arrange spatially in a grid overlay. Navigate with `Alt+Arrow` keys for fast directional switching between files without reaching for tabs.
3125
- **Tree-sitter symbol index** across Python, JavaScript/TypeScript, and C/C++ with cross-file references.
@@ -59,14 +53,6 @@ https://github.com/user-attachments/assets/63e442cf-6d3a-4cbc-a96d-20fe8c4964c8
5953

6054
### Code Review
6155

62-
https://github.com/user-attachments/assets/0e853df6-2d84-4c58-8ea8-95251c4e6822
63-
64-
<details><summary>Slow version</summary>
65-
66-
https://github.com/user-attachments/assets/d923e278-b3ef-46a4-b19e-0d54099bf3a7
67-
68-
</details>
69-
7056
1. Click the review button in the header bar.
7157
2. Select a commit in the git graph to set the review base.
7258
3. Click **Start Review** — the repo enters review mode (soft reset).

specs3/1-foundation/communication_layer.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -336,14 +336,15 @@ Three top-level service classes, registered via `add_class()`:
336336

337337
| Method | Signature | Description |
338338
|--------|-----------|-------------|
339-
| `LLMService.get_current_state` | `() → {messages, selected_files, excluded_index_files, streaming_active, session_id, repo_name, cross_ref_enabled}` | Full state snapshot |
339+
| `LLMService.get_current_state` | `() → {messages, selected_files, excluded_index_files, streaming_active, session_id, repo_name, init_complete, mode, cross_ref_ready, cross_ref_enabled, doc_convert_available}` | Full state snapshot |
340340
| `LLMService.set_selected_files` | `(files) → [string]` | Update file selection |
341341
| `LLMService.get_selected_files` | `() → [string]` | Current selection |
342342
| `LLMService.chat_streaming` | `(request_id, message, files?, images?) → {status}` | Start streaming chat |
343343
| `LLMService.cancel_streaming` | `(request_id) → {status}` | Cancel active stream |
344344
| `LLMService.new_session` | `() → {session_id}` | Start new session |
345345
| `LLMService.generate_commit_message` | `(diff_text) → string` | Generate commit message |
346346
| `LLMService.commit_all` | `() → {status: "started"}` | Stage, generate message, commit — result via `commitResult` broadcast |
347+
| `LLMService.reset_to_head` | `() → {status, system_event_message}` | Reset git to HEAD, record system event in conversation context and history |
347348
| `LLMService.get_context_breakdown` | `() → {model, total_tokens, blocks, breakdown, ...}` | Token/tier breakdown |
348349
| `LLMService.check_review_ready` | `() → {clean, message?}` | Check for clean tree |
349350
| `LLMService.get_commit_graph` | `(limit?, offset?, include_remote?) → {commits, branches, has_more}` | Delegates to Repo |
@@ -402,6 +403,7 @@ Three top-level service classes, registered via `add_class()`:
402403
| `Collab.deny_client` | `(client_id) → {ok, client_id}` | Deny and disconnect a pending connection |
403404
| `Collab.get_connected_clients` | `() → [{client_id, ip, role, is_localhost}]` | List all connected clients |
404405
| `Collab.get_collab_role` | `() → {role, is_localhost, client_id}` | Calling client's own role |
406+
| `Collab.get_share_info` | `() → {ips: [string], port: int}` | Routable LAN IPs and WebSocket port for share URL construction |
405407

406408
### Browser Methods (Server → Client)
407409

specs3/1-foundation/configuration.md

Lines changed: 10 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,8 @@ The `min_cacheable_tokens` is model-aware — per Anthropic's prompt caching doc
3636
- **4096 tokens** for Claude Opus 4.5/4.6, Haiku 4.5
3737
- **1024 tokens** for Claude Sonnet and other Claude models
3838

39+
The version matching uses string-contains checks on the lowercased model name, matching both dash-separated and dot-separated version patterns (e.g., `"4-5"` and `"4.5"` both match). Non-Claude models default to 1024.
40+
3941
The `cache_min_tokens` config value (default: 1024) can override upward but never below the model's hard minimum. Example: Opus 4.6 → `max(1024, 4096) × 1.1 = 4505`. Sonnet → `max(1024, 1024) × 1.1 = 1126`.
4042

4143
A fallback `cache_target_tokens` property (without model reference) computes `cache_min_tokens × cache_buffer_multiplier` (default: 1126) for callers that don't have a model reference.
@@ -149,15 +151,17 @@ Users who customized managed files directly (instead of using `system_extra.md`)
149151

150152
## Token Counter Data Sources
151153

152-
The token counter uses `litellm`'s model registry to determine model-specific limits:
154+
The token counter uses hardcoded model-family defaults for limits and `tiktoken` for tokenization:
153155

154156
| Property | Source | Fallback |
155157
|----------|--------|----------|
156-
| Tokenizer | `tiktoken.get_encoding()` for the configured model | ~4 characters per token estimate |
157-
| `max_input_tokens` | `litellm` model info based on model name | Hardcoded defaults by model family |
158-
| `max_output_tokens` | `litellm` model info | Hardcoded defaults by model family |
158+
| Tokenizer | `tiktoken.get_encoding("cl100k_base")` | ~4 characters per token estimate |
159+
| `max_input_tokens` | Hardcoded: 1,000,000 for all currently supported models (Claude, GPT-4, GPT-3.5) | 1,000,000 |
160+
| `max_output_tokens` | Hardcoded: 8,192 for Claude models, 4,096 for others | 4,096 |
159161
| `max_history_tokens` | Computed: `max_input_tokens / 16` ||
160162

163+
**Note:** The implementation does not query `litellm`'s model registry at runtime. All limits are hardcoded constants in `token_counter.py`. The `cl100k_base` encoding is used for all models regardless of provider.
164+
161165
## Settings Service (RPC)
162166

163167
A whitelisted set of config types can be read, written, and reloaded:
@@ -228,12 +232,12 @@ These files can still be edited directly on disk in the config directory.
228232

229233
## `.ac-dc/` Directory
230234

231-
A per-repository working directory at `{repo_root}/.ac-dc/`. Created on first run and added to `.gitignore`.
235+
A per-repository working directory at `{repo_root}/.ac-dc/`. Created on first run by `ConfigManager._init_ac_dc_dir()` and added to `.gitignore`. The `images/` subdirectory is also created at this time (not lazily by the history store).
232236

233237
| File | Purpose | Lifecycle |
234238
|------|---------|-----------|
235239
| `history.jsonl` | Persistent conversation history | Append-only |
236240
| `symbol_map.txt` | Current symbol map | Rebuilt on startup and before each LLM request |
237241
| `snippets.json` | Per-repo prompt snippets override (optional, all modes) | User-managed |
238-
| `images/` | Persisted chat images | Write on paste, read on session load |
242+
| `images/` | Persisted chat images | Created by ConfigManager on init; write on paste, read on session load |
239243
| `doc_cache/` | Disk-persisted document outline cache (keyword-enriched) | Auto-managed |

specs3/1-foundation/repository_operations.md

Lines changed: 18 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -87,12 +87,24 @@ Per-file addition/deletion counts from `git diff --numstat` (both staged and uns
8787

8888
### Commit Flow (UI-Driven)
8989

90-
1. Stage all changes (`stage_all`)
91-
2. Get staged diff (`get_staged_diff`)
92-
3. Send diff to LLM to generate commit message
93-
4. Commit with generated message (`commit`)
94-
5. Display commit message as assistant message in chat
95-
6. Refresh file tree
90+
1. User clicks 💾 in action bar → `LLMService.commit_all()`
91+
2. Server captures current session ID **synchronously before launching the background task**, returns `{status: "started"}` immediately. The session ID is captured early so the commit event is persisted to the correct session even if `_session_id` is replaced by `_restore_last_session()` during a concurrent server restart.
92+
3. Background task: stage all changes (`stage_all`)
93+
4. Get staged diff (`get_staged_diff`)
94+
5. Send diff to LLM to generate commit message
95+
6. Commit with generated message (`commit`)
96+
7. Record a **system event message** (`role: "user"`, `system_event: true`) in conversation context and persistent history, using the captured session ID
97+
8. Broadcast `commitResult` to all clients (displays as system event card in chat)
98+
9. Clients refresh file tree
99+
100+
### Reset Flow (UI-Driven)
101+
102+
1. User clicks ⚠️ in action bar → confirmation dialog
103+
2. On confirm → `LLMService.reset_to_head()`
104+
3. Server delegates to `Repo.reset_hard()`
105+
4. Record a **system event message** (`role: "user"`, `system_event: true`) in conversation context and persistent history
106+
5. Return result with `system_event_message` field
107+
6. Client displays system event card and refreshes file tree
96108

97109
## Search
98110

specs3/2-code-analysis/document_mode.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -518,6 +518,7 @@ KeyBERT depends on `sentence-transformers` which downloads the configured model
518518
- `KeyBERT` is imported inside `__init__` or on first call
519519
- If `keybert` is not installed, a warning is logged and headings are emitted without keywords
520520
- The model is initialized once and reused across all files in an indexing run
521+
- Before loading the model, the enricher probes the Hugging Face local cache via `huggingface_hub.try_to_load_from_cache()` to determine whether the sentence-transformer model needs downloading. If the probe returns `None` (model not cached), a "Downloading…" progress message is shown; otherwise a "Loading from cache…" message is shown. The probe is non-critical — if it fails, initialization proceeds normally with a generic "Loading…" message
521522

522523
### Graceful Degradation in Packaged Releases
523524

@@ -599,6 +600,8 @@ For comparison, tree-sitter indexing of a full repo takes 1-5s. Document indexin
599600

600601
**Threaded cache writes:** During the background enrichment phase, a `ThreadPoolExecutor(max_workers=4)` overlaps disk sidecar writes with the CPU-bound keyword extraction for the next file. Since enrichment is CPU-bound (sentence-transformer embedding) and cache writes are I/O-bound, this keeps disk I/O off the critical path. The sentence-transformer itself is **not** run in threads — Python's GIL prevents CPU-bound threading from providing speedup, and the model's ~420MB memory footprint makes process-based parallelism impractical. The real speed win comes from batched extraction: `KeywordEnricher.enrich()` sends all sections to KeyBERT in a single `extract_keywords()` call, which lets the underlying transformer batch-encode embeddings in one forward pass (2-4× faster than per-heading calls).
601602

603+
**Structure-only extraction method:** `DocIndex._extract_outlines_structure_only()` is a separate code path from `_extract_outlines()` that accepts any cached outline regardless of keyword model — it passes `keyword_model=None` to the cache `get()` call, which skips the model-match check. This means an outline enriched with an old model, or an unenriched outline, will be accepted and reused. Only files whose mtime has changed are re-parsed. This method is used by mode switching and chat requests (via `_stream_chat`) to avoid blocking on keyword enrichment during user-facing operations.
604+
602605
**Two-phase indexing principle:** Structural extraction (headings, links, section sizes) is always **synchronous and instant** (<5ms per file via regex). Keyword enrichment is always **asynchronous and never blocks** any user-facing operation. This separation eliminates all blocking edge cases:
603606

604607
- Mode switches are instant — unenriched outlines are available immediately
@@ -851,6 +854,8 @@ The reference index is built in two passes:
851854
1. **Collect**: iterate over all `DocOutline` objects, extracting every `DocLink` with its `source_heading` and `target_heading` fields. Build a mapping: `(source_path, source_heading) → [(target_path, target_heading)]`
852855
2. **Resolve**: for each link, look up the target path's `DocOutline` and resolve the `target_heading` anchor to a `DocHeading` node. Increment that heading's `incoming_ref_count`. Record the resolved link as a `DocSectionRef` on the source heading's `outgoing_refs` list
853856

857+
**Image link resolution shortcut:** Image links (`is_image=True`) whose targets were already resolved to repo-relative paths by the markdown extractor's path-extension scan skip the `_resolve_link()` step entirely — their `target_file_part` is used directly as the resolved path. This avoids double-resolution (the markdown extractor already resolved relative paths against the source file's directory).
858+
854859
This two-pass approach ensures all outlines are available before resolution begins (a link from doc A to doc B requires B's outline to resolve the heading anchor).
855860

856861
## Design Decisions

specs3/2-code-analysis/symbol_index.md

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -132,11 +132,15 @@ The MATLAB extractor (`MatlabExtractor`) uses **regex-based parsing** — no tre
132132

133133
**Function body analysis:**
134134

135-
For each function, the extractor also scans the body text to extract:
135+
Before pattern matching, a copy of the source text is preprocessed to strip line comments (`%...`) and string literals (`'...'` and `"..."`) to avoid false positives in call site and variable detection.
136+
137+
For each function, the extractor scans the preprocessed body text to extract:
136138
- **Call sites** — identifiers followed by `(`, excluding MATLAB keywords. Produced as `CallSite` objects for the reference index
137139
- **Local variables** — LHS identifiers in assignments (`x = ...` or `[a, b] = ...`), excluding parameters, output args, and keywords. Attached as `variable`-kind children of the function symbol
138140
- **Read variables** — identifiers that appear in the body but are never assigned locally and are not parameters, outputs, or MATLAB builtins. Also attached as `variable`-kind children
139141

142+
**Builtin exclusion:** A large set of common MATLAB builtins (`disp`, `fprintf`, `zeros`, `ones`, `plot`, `figure`, `fopen`, `exist`, `isa`, `class`, `double`, etc. — approximately 80 entries) and keywords (`if`, `for`, `end`, etc.) are excluded from both call site detection and read-variable detection to reduce noise in the symbol map.
143+
140144
**Nesting and `end` tracking:**
141145

142146
MATLAB uses `end` to close `function`, `classdef`, `if`, `for`, `while`, `switch`, `try`, and `parfor` blocks. The `_find_end()` helper scans forward from a block-opening line, tracking nesting depth, to find the matching `end`. This determines function/class extent for:
@@ -148,7 +152,7 @@ MATLAB uses `end` to close `function`, `classdef`, `if`, `for`, `while`, `switch
148152

149153
**Output arguments as return type:** If a function declares output arguments (`function [a, b] = myFunc(...)`), the output names are joined with `, ` and stored as `return_type` on the symbol (e.g., `"a, b"`).
150154

151-
**Comment and string stripping:** Before pattern matching, line comments (`%...`) and string literals (`'...'` and `"..."`) are stripped from a copy of the source text to avoid false positives in call site and variable detection.
155+
**Comment and string stripping:** Line comments (`%...`) and string literals (`'...'` and `"..."`) are stripped at two levels: (1) a global copy of the source text is preprocessed for top-level pattern matching (`_FUNC_RE`, `_CLASS_RE`, `_VAR_RE`), and (2) each function's body text is independently stripped within `_extract_calls`, `_extract_local_vars`, and `_extract_read_vars` before scanning for identifiers. The per-function-body stripping ensures that comments and strings inside function bodies don't produce false positives even when the body text is sliced from the original (unstripped) source.
152156

153157
**Builtin exclusion:** A large set of common MATLAB builtins (`disp`, `fprintf`, `zeros`, `plot`, etc.) and keywords (`if`, `for`, `end`, etc.) are excluded from read-variable detection to reduce noise in the symbol map.
154158

0 commit comments

Comments
 (0)