refactor: enforce Turn-as-Unit SSOT + 200-line file discipline by yishuiliunian · Pull Request #183 · AgentsMesh/Loopal

yishuiliunian · 2026-05-26T13:26:41Z

Summary

Move TurnTracker into loopal-context so TurnStore mutators become pub(crate) — runtime cannot bypass the tracker to mutate turn state at compile time.
Retire ContextStore::from_messages wire-format entry; tests now seed history via loopal_test_support::seed_history. Adds tests/architecture_boundary_test.rs to grep cross-layer wire-type leaks.
Split 10 source files exceeding 200 lines into directory modules; strip //! / /// docs that restate signatures (HARD RULE: code is SSOT). All src/**/*.rs now ≤200 lines.

Changes

Architecture enforcement (8-phase plan):

loopal-context::turn_tracker now owns the only mutation path; TurnStore::{start_turn, append_step, ...} are pub(crate).
Wire-only mutations (microcompact, condense_server_blocks) flow through TurnTracker::with_wire_mut.
Re-derive open ToolBatch step on TurnTracker::new / replace_store so resume-mid-batch correctly routes update_tool_state.
Drop dead modules: provider-api Middleware trait, ContextStore pipeline/config_refresh/degradation/ingestion middleware, governance compensation.

File-size + comment cleanup:

Directory modules: turn_event_store, turn_store, turn_tracker, turn_projection, request_turns, compaction, compact_rehydrate, ingestion, turn_degradation, resolver.
Sibling extractions: SettleSignal, mcp_settle, session_start_prompt.

Net: 137 files, +1581 / -4485 (-2904).

Test plan

CI passes (bazel build / test / clippy / rustfmt)
New tests/architecture_boundary_test.rs enforces wire-type containment
tests/agent_loop/compaction_run_e2e_test.rs exercises microcompact + resume invariants

First step of the Turn-as-Unit refactor. Establishes the new domain entity that will eventually replace the wire-format-leaked `loopal-message::Message`. Motivation: see /Users/stone/.claude/plans/breezy-tickling-cerf.md Provider Boundary 原则 — wire-format concepts (MessageRole, ContentBlock, {role, content[]} structs) leaked across all layers (domain / IPC / storage / view). This crate is the new domain entity that replaces Message. Scope of PR-1: * New crate `loopal-turn` (no consumers yet — independent introduction) * ADT modules: - `content.rs` — TextBlock, ToolCall, ToolResult, ThinkingBlock, ServerToolPair (server tool call+result fused — Anthropic I5 by type) - `step.rs` — TurnStep enum (LlmCall / ToolBatch / Compaction / Injection), AssistantOutput (tool_calls Vec order = LLM stream order, encodes I4), OrderedToolBatch + ToolBatchItem (call+state fused — Anthropic I2 by type) - `turn.rs` — Turn { id, trigger, body, outcome }, TurnTrigger enum, TurnOutcome with InProgress/Complete/Idle/Error/Cancelled - `event.rs` — TurnEvent (TurnStarted/StepAppended/StepUpdated/TurnEnded) for event-sourced persistence (impl in PR-3) Tests: serde round-trip, parallel tool order preservation, event variant serialization. All pass; clippy + rustfmt clean. Out of scope (later PRs): - TurnRepo impl + jsonl persistence — PR-3 (storage rewrite) - AgentLoopRunner integration — PR-4 (runtime switch) - Provider build_body from Turn — PR-5a - Protocol/IPC type rename (ProjectedMessage → ProjectedTurn) — PR-5b - `loopal-message` crate deletion — PR-6

Trait surface: start_turn / append_step / update_tool_state / end_turn / load_turns / snapshot_turn. Each operation also pushes a TurnEvent for event-sourcing replay (used by jsonl persistence in later PR). InMemoryTurnRepo provides: - Thread-safe (Arc<RwLock<_>>) state - TurnAlreadyEnded guard prevents writes after outcome set - StepNotToolBatch guard for update_tool_state on wrong step kind jsonl/file-backed impl deferred to PR-3 (loopal-storage rewrite). 6 unit tests cover lifecycle + error paths.

新增 TurnStore 作为 ContextStore 的 Turn-based 等价物： - TurnStore: 内存中存储 Vec<Turn>，跟踪 in-progress turn - 公共 API: start_turn / append_step / end_current_turn / from_turns - 保留 budget + last_actual_input_tokens + last_assistant_activity_at 字段 - 与 ContextStore 并存，先建立平行结构，后续 PR 逐步迁移调用方测试：7 个单元测试覆盖 lifecycle + crash recovery 路径

引入 turn_projection 模块作为 wire-format 投影的统一入口： - project_turn_to_messages(&Turn) -> Vec<Message> - project_turns_to_messages(&[Turn]) -> Vec<Message> 投影规则将 5 个 Anthropic API invariant 在类型层自然产生： - I1 alternation: LlmCall 后必有 ToolBatch（如果有 tool_calls） - I2 id pairing: ToolBatchItem 把 call+state 绑死 - I3 tool_result before text: ToolBatch 投影为独立 user message - I4 parallel ordering: OrderedToolBatch.items Vec 顺序锁定 ToolCall 顺序 - I5 server pairing: ServerToolPair 投影为 ServerToolUse + ServerToolResult 配对 8 个单元测试覆盖： - empty / user trigger / llm text / parallel tool ordering - server pair / compaction / injection origin mapping / cancelled state 后续 PR-5a 此模块将被搬到 provider adapter 内部，作为 build_anthropic_messages_json / build_openai_messages_json 的雏形。

…3/7) 新增 TurnEventStore：以 event-sourcing 模式持久化 Turn，每行一个 TurnEvent。 - turns.jsonl 文件位于 sessions/<id>/turns.jsonl，与现有 messages.jsonl 并存 - append_event: 追加 TurnStarted / StepAppended / StepUpdated / TurnEnded - load_turns: fold events → Vec<Turn>，缺失 TurnEnded 自动按 CrashRecovery Cancelled 收口；Pending/Running tool item 同步标 Cancelled，确保 load 出的 Vec<Turn> 不会破坏后续 invariant 6 个测试覆盖： - roundtrip append/load - fold 单 turn / 多 turn 顺序 - step update 修补 tool state - crash recovery (缺 TurnEnded) - 文件缺失返回空后续 PR-4 让 runtime 同时写 turns.jsonl + messages.jsonl 做平行验证； PR-6 删除 MessageStore 和 messages.jsonl 完成 boundary 清理。

Runtime 现在同时维护 messages.jsonl (现有) 和 turns.jsonl (新增) 两份持久化： - AgentLoopRunner 新增 current_turn_id + current_step_index 跟踪状态 - SessionManager 持有 TurnEventStore，新增 record_turn_event API - turn_record.rs: start_turn_record / append_step_record / end_turn_record helpers - turn_trigger_map.rs: Envelope → TurnTrigger 映射 (Human/Cron/System) 10 个 save_message 路径加入对应 turn event： - ingest.rs: TurnStarted (来自 envelope 的 trigger) - llm_record.rs: StepAppended(LlmCall) — thinking/text/tool_calls/server_pairs - tools_finalize.rs: StepAppended(ToolBatch) — Done items - tools_inject.rs/emit_all_interrupted: StepAppended(ToolBatch) — Cancelled items - compaction_run.rs: StepAppended(Compaction) — summary + ack - compact_rehydrate.rs: StepAppended(Compaction) — rehydrated 文件列表 - stop_feedback.rs: StepAppended(Injection { StopFeedback }) - governance/system_note.rs: StepAppended(Injection { Governance/SystemNote/... }) Turn boundary 由 transition() 显式管理： - WaitingForInput/Finished 触发 TurnEnded(Complete) - transition_error() 触发 TurnEnded(Error) - 下一个 ingest 前若仍有 InProgress turn 则按 ParentTurnAborted Cancelled 收口完整 //... build + test 全部 89 测试 PASS；clippy + rustfmt clean。现阶段两份格式并存，PR-6 删除 MessageStore 后 turns.jsonl 成为唯一 SSOT。

…oads (PR-5b/7) IPC schema 字段重命名修正 boundary leak： - InboxEnqueued.message_id → envelope_id - InboxConsumed.message_id → envelope_id - UserMessageQueued.message_id → envelope_id 实际承载的就是 envelope.id；旧命名暗示 Message 是 first-class entity，是 wire-format leak 到 IPC schema 的体现。重命名让 IPC 层用 envelope 概念，消除"runtime 内部跟 protocol 跨进程契约说同一个词"的隐喻冲突。涉及 15 个文件批量修正（pattern destructure + struct initializer + test assertion 全部跟随）。Loopal 不是公开 SDK，IPC 是内部协议， worktree 隔离 + 同 commit deploy 可以 hard cut。全部 89 测试 PASS；clippy + rustfmt clean。

…e (PR-5b/7) 彻底解除 loopal-protocol 对 loopal-message 的依赖，修正 IPC schema 层的 wire-format leak。 - 删除 protocol/projection.rs，将 project_messages 搬到 loopal-context (新增 message_projection 模块) — implementation 层处理 message-aware 投影 - 删除 protocol/envelope.rs 中 From<&MessageSource> for MessageOrigin —— 搬到 runtime/message_build.rs 作为 message_origin_for plain function - 删除 4 个 protocol cross-crate projection 测试 (依赖 loopal-message) - 更新 callers: bootstrap/attach_mode.rs, bootstrap/sub_agent_resume.rs, tui/resume_display.rs 从 loopal-context 导入 project_messages PR-5b 验收 grep 测试全部通过： - crates/loopal-protocol/Cargo.toml 不引用 loopal-message ✓ - crates/loopal-protocol/BUILD.bazel 不依赖 //crates/loopal-message ✓ - crates/loopal-protocol/src/ 不 import MessageRole / ContentBlock ✓ 注意 ProjectedMessage / ProjectedToolCall 类型保留在 loopal-protocol 作为 IPC schema —— 它们只是 String/JSON shape，没有 wire-format 依赖。真正的 message-shaped → turn-shaped 重命名 (ProjectedMessage → ProjectedTurn) 与 TUI hydration 路径切换合并到后续 PR。全部 89 测试 PASS；clippy + rustfmt clean。

…a/7) 新增 ChatParams.turns: Vec<Turn> 作为 domain-shaped 输入字段（messages 保留向后兼容）。Anthropic adapter 内部新增 build_messages_json_from_turns 函数，直接 fold Turns → Anthropic wire JSON，跳过 normalize/finalize/ sanitize 中间步骤——5 个 invariant 由 Turn 类型保证。 - ChatParams 加 turns 字段；所有 14 处字面初始化跟随补 turns: vec![] - AgentLoopRunner 持有 in-memory TurnStore (loopal-context)，turn_record helpers 同步推送到 turn_store - llm_params.rs: prepare_chat_params 把 turn_store.turns() 灌进 ChatParams - anthropic/send.rs: build_request_body if turns 非空走 turn-based path else fall back to messages path（smart_compact_llm 单 message 一次性调用保持兼容） - openai/google: 暂时通过 project_turns_to_messages 把 turns 还原为 messages 喂给现有 build_input/build_contents；后续 PR 各自实现 turn-based 直接 fold 全部 89 测试 PASS；clippy + rustfmt clean。Anthropic 是 hot path， turn-based 直接生效；OpenAI/Google 走桥接 path 保证一致性。

新增 boundary job 在 CI 中强制检查 PR-5b 已经清理干净的 boundary： - crates/loopal-protocol/BUILD.bazel 不依赖 loopal-message - crates/loopal-protocol/src 不 import loopal_message - protocol src 不引用 MessageRole / ContentBlock（McpContentBlock 例外） - event_payload 的 InboxEnqueued/InboxConsumed/UserMessageQueued variant 不再用 message_id 字段后续 PR 在 boundary 验收 grep test 满足后逐步添加更严格的 check （domain layer / provider crate 边界）。这层 CI 防回退保证 PR-5b 的 IPC boundary 修正在 main 上不被悄悄打破。

resume_session 现在优先读 turns.jsonl 还原 Vec<Turn>，再投影成 Vec<Message> 喂给现有 caller。messages.jsonl 仅在 turns.jsonl 缺失时作为 legacy fallback。 sub-agent load_messages 同样语义。至此 storage layer 的 SSOT 正式切换到 turns.jsonl： - 写入：runtime 同时写两份（dual-write 自 PR-4 起） - 读取：resume 优先 turns，messages 仅 legacy fallback - 投影：runtime 内通过 loopal-context::project_turns_to_messages 还原 message-shape view 给暂时未迁移的 caller 后续 PR 等 caller 全部迁移到消费 Vec<Turn> 后，messages.jsonl 写入可彻底删除，message_store 退役。全部 89 测试 PASS；clippy + rustfmt clean。

最终消除 loopal-message crate 作为 cross-cutting wire-format dependency 的存在： - Message / MessageRole / ContentBlock / ImageSource / normalize_messages 移到 loopal-provider-api/src/wire/（provider-shared schema layer） - MessageOrigin 移到 loopal-turn/src/origin.rs（domain audit metadata） - 删除 crates/loopal-message/ 整个目录 - 所有 12 个依赖该 crate 的 BUILD.bazel 改成依赖 loopal-provider-api 或 loopal-turn - 全部 70+ 文件中 `use loopal_message::X` 替换为 `use loopal_provider_api::X` （wire 类型）或 `use loopal_turn::MessageOrigin`（audit） PR-6 acceptance grep tests 验证： - Test 1: `ls crates/ | grep '^loopal-message$'` → 空 ✓ - Test 3: `loopal-protocol/{Cargo.toml,BUILD.bazel,src/}` 不引用 loopal-message ✓ - Test 4: 整个 codebase 无 `loopal-message` BUILD dep 或 `use loopal_message::` ✓ (仅 envelope.rs / origin.rs 注释提到历史名字，非实际引用) Test 2 (domain layer 不引用 MessageRole/ContentBlock 等 type 名字) 的精神已经达到 - 类型现在位于 loopal-provider-api（schema crate），不再有独立的 loopal-message wire-format crate; 这是 plan PR-6 的 "boundary 修正"。剩余文字 grep（如 MessageRole 在 domain layer crates）属于 implementation 层的 message-shape view 还存活；这层会随后续 hydrate 重构（Turn 模型加 Image 类型 + ContextStore → TurnStore 替换）逐步退役。当前 dual 存在： turns.jsonl 是写入 SSOT（PR-4），resume 也优先读 turns（PR-6 earlier）。全部 88 测试 PASS（之前 89，减少的 1 个是 loopal-message 自己的测试目标随 crate 删除消失）；clippy + rustfmt clean。

新增 CI grep check 防止任何形式的 loopal-message 回退： - crates/loopal-message 目录不能存在 - 任何 src 文件不能有 `use loopal_message::` - 任何 BUILD.bazel 不能依赖 `//crates/loopal-message` PR-6 acceptance test 1 (crate deleted) 在 CI 中强制 enforce。配合 PR-5b 的 protocol boundary check + envelope_id check，整体 wire-format boundary 不可回退。

之前的 CI boundary job 包含多个 build 已经能抓的 check： - 检查 crates/loopal-message 目录不存在 → crate 不存在则 import 立刻编译失败，bazel 检测不到 target 也立刻挂 - 检查源码没有 use loopal_message:: → 同上 - 检查 BUILD.bazel 不依赖 //crates/loopal-message → 同上这些 grep 在 build 失败之外的能见度是零。删掉。 CI boundary 现在只保留 build 抓不到、grep 能抓的两类违规： 1. protocol/src 不能 pub use 或 type alias 来自他处的 MessageRole / ContentBlock（compile 通过，但 boundary leak 复活） 2. event_payload 字段名不能从 envelope_id 改回 message_id （rename 是合法 Rust，但破坏 IPC schema cleanup）

之前加的 boundary job 用 shell grep 表达架构约束，跟 CI 其他 step 全 bazel 的风格不符；正则脆弱（空格/换行/同名变量都可能误判），本地跑不到只能 push 上 CI 试。架构 boundary 真正的执行点是 crate 边界 + pub 可见性 + code review； CI 的角色是 build / test / lint。如果以后想固化某个 boundary，应该写成 bazel rust_test target（本地可跑、参与 fail-fast、可以单测重现），不是 YAML 里跑 grep。

修正反向依赖：loopal-provider 此前依赖 loopal-context 仅为复用 project_turns_to_messages（OpenAI/Google bridge）— 但 provider 是基础设施层，不应被 context (domain/middleware) 反向依赖。把 project_turn_to_messages / project_turns_to_messages 从 loopal-context::turn_projection 搬到 loopal-provider-api::wire::turn_projection （schema crate，Turn 和 Message 都在那），让所有 caller 从 provider-api 导入： - loopal-provider/src/{openai,google}/mod.rs: 改 loopal_provider_api:: - loopal-runtime/src/session.rs: 同上 - 删除 loopal-provider/BUILD.bazel 中 //crates/loopal-context 依赖 - 把 turn_projection_test.rs 一并搬迁到 provider-api/tests 依赖方向恢复：loopal-provider → loopal-provider-api → loopal-turn ✓ （之前 provider → context → provider-api → turn 反向了）全部 88 测试 PASS；clippy + rustfmt clean。

…e 5) 修正"用字段空值表达 tagged union"反模式。之前： TurnStep::Compaction(CompactionRecord { summary_text, ack_text, // compaction_run emit 时填这两个 rehydrated: Vec<RehydratedFile>, // compact_rehydrate emit 时填这一个 ... }) 两种语义靠字段空值区分（summary_text 空 ↔ rehydrate only），破坏类型自描述。之后： TurnStep::CompactionSummary(CompactionSummary { summary_text, ack_text, ... }) TurnStep::CompactionRehydrate(CompactionRehydrate { files }) 每个 variant 表达单一语义。Anthropic build_request_body 和 turn_projection 也按 variant 分支 handle，逻辑更直观。下游 fold/replay 时不再需要 "字段空就跳过" 判断。全部 88 测试 PASS；clippy + rustfmt clean。

之前 turn_projection 处理 TurnTrigger 时： - Cron 投影但丢前缀（原 ingest 是 `[scheduled] {text}`） - Agent / Channel 走 UserInput 分支，结构化 address 信息全丢 - GoalContinuation / BackgroundHook 投影成 None — 这俩本身是 ephemeral_in_history 但仍要进入 LLM 上下文，原来 ingest 写过 user message。这里 None 等于 resume 之后整个 hook/goal-continuation 触发的轮次从 LLM 上下文里消失，行为回归。修正： 1. TurnTrigger 加 Agent { from, content } / Channel { channel, from, content } 两个 variant 保留结构化路由信息 2. Cron / GoalContinuation / BackgroundHook 字段统一为 { envelope_id, content } 3. turn_projection 与 anthropic/request_turns 投影时按 build_user_message 的前缀规则: `[scheduled]`, `[from: addr]`, `[from: #chan/from]` 4. GoalContinuation/BackgroundHook 投影出可见 user message（不再 None） 5. turn_trigger_map (envelope → trigger) 跟随调整 6. 5 个 round-trip 测试覆盖每种 trigger 的投影前缀和 origin 全部 88 测试 PASS；clippy + rustfmt clean。

之前 tools_finalize 把每个 ToolResult block 反向 fold 成 ToolBatch step， ToolCall 的 name/input 用空字符串 / Null 占位 — 违反 ToolBatchItem 设计本意（call+result 类型层绑死锁 I2 invariant）。turns.jsonl 里 ToolBatch 的 call 信息事实上是垃圾。新设计 (event-sourcing 正用): 1. execute_tools 入口 emit 一个 ToolBatch step，所有 items 是 Pending 但 carry 完整 ToolCall (name + input from LLM stream)。step_index 缓存在 AgentLoopRunner.current_tool_batch_step。 2. tools_finalize 不再 emit 新 ToolBatch step。每个 ToolResult block 用 tool_use_id 映射回 batch.items[i] 的位置，发 StepUpdated 把 state patch 成 Done。 3. emit_all_interrupted 同样发 StepUpdated 把每个 item 标 Cancelled (UserInterrupt)。 4. close_tool_batch_record 清空 in-flight 索引。 5. TurnStore 增加 update_tool_state；TurnStoreError 增加 3 个 index-out-of-range / step-mismatch variant。 6. turn_record 新增 start/update/close_tool_batch_record helpers。效果：turns.jsonl 里 ToolBatchItem.call 是真实的 (name + input 完整)； I2/I4 invariant 在事件序列上也实质成立 (StepUpdated 只能 patch state， call 字段不可变)。全部 88 测试 PASS；clippy + rustfmt clean。

…ue 2) 之前 turn_record helpers 把 turn_store (in-memory) 和 turns.jsonl 写入独立处理，任一失败只 warn! 不回滚 → 两份视图悄悄漂移： - fold(turns.jsonl) 重建出的 Vec<Turn> 与 ALR.turn_store 不一致 - 进而 LLM 上下文（来自 turn_store 投影）跟 resume 后看到的不一致新 fail-closed pattern (governance/system_note.rs 已有先例的扩展): in-memory 先变更 → 立即持久化事件 → 失败则回滚 in-memory 具体改动： - start_turn_record: 持久化失败时 pop in-memory turn；返回 Option<TurnId> - append_step_record: 持久化失败时 pop in-memory step；只在成功时递增 current_step_index（让 step_index 与 jsonl 完美对齐） - update_tool_batch_item_state: 先 snapshot 旧 state，持久化失败时 best- effort 回滚到旧 state - end_turn_record: 持久化失败时不回滚 (shutdown 路径太侵入)，依赖 resume 的 CrashRecovery 语义把缺 TurnEnded 的 turn 收口为 Cancelled，最终两侧仍收敛 - 新增 persist_event() helper 单点封装 record_turn_event + warn 日志收益：turn_store 与 turns.jsonl 任意时刻可被 fold() 验证一致；resume 路径的 SSOT 假设（"jsonl 是真相"）真正成立。全部 88 测试 PASS；clippy + rustfmt clean。

MessageOrigin 是 wire-format Message 的 audit metadata（标记消息怎么进入对话）。之前住在 loopal-turn (domain crate)，名字暗示 Message 是 domain 一等公民。搬到 loopal-provider-api::wire::origin，跟 Message / ContentBlock 同居 schema crate；loopal-turn 现在不再 export 任何 message-shape 类型，名副其实是 pure domain。调用方 (~12 文件) 改 use loopal_turn::MessageOrigin → loopal_provider_api。全部 88 测试 PASS；clippy + rustfmt clean。

…sue 8) 之前两个 projection 模块都叫 "projection" 但层级不同： - loopal-provider-api::wire::turn_projection: Turn (domain) → Message (wire) - loopal-context::message_projection: Message (wire) → ProjectedMessage (display) turn_projection 已在 wire/ 目录下 + 函数名 project_turns_to_messages，方向自带。message_projection 名字没说方向，且 project_messages 是 generic 名。重命名让方向显式： - 文件 message_projection.rs → display_projection.rs - 函数 project_messages → project_messages_to_display 调用方 (4 文件: tui/resume_display, bootstrap/attach_mode, bootstrap/ sub_agent_resume, lib re-export) 跟随。全部 88 测试 PASS；clippy + rustfmt clean。

…Issue 9) AgentLoopRunner 之前 17 个字段，turn-related 占了 4 个 (current_turn_id / current_step_index / current_tool_batch_step / turn_store)。任何新加 turn 相关字段会继续污染 ALR struct。提取 TurnTracker (loopal-runtime/src/agent_loop/turn_tracker.rs) 把 4 个字段合并为 1 个 ALR.turns 字段。TurnTracker 是纯数据 + 一个 reset 辅助方法 (无业务逻辑；turn_record helpers 仍是 ALR impl，通过 self.turns.X 访问)。调用方按 (self.current_X → self.turns.current_X) 和 (self.turn_store → self.turns.store) 机械替换，~27 处 touch。 ALR 现在 14 字段，turn 状态集中在一个语义清晰的 sub-struct 里；下次加 parent_turn_id (sub-agent) 之类的字段只动 TurnTracker。全部 88 测试 PASS；clippy + rustfmt clean。

… (Issue 10) 最大的一个清理。架构 audit Issue 10 指出 ChatParams 同时有 messages + turns 是双 SSOT，违反 single-writer 原则；同时 Provider trait 的 finalize_messages 和大量 message-shape 内部建模都是 message-fallback path 的残留。改动核心： 1. ChatParams.messages 字段删除；turns 是唯一对话历史输入 2. Provider::finalize_messages trait 方法删除（user-tail 逻辑搬进 anthropic build_messages_json_from_turns，按 model.supports_prefill + continuation_intent 决策） 3. anthropic/finalize.rs 文件删除，anthropic/request.rs 仅保留 build_tools 4. OpenAI / Google / OpenAI-compat 内部 project_turns_to_messages → 走现有 build_input / build_contents / build_messages；无 ChatParams 双 clone，单次 projection 5. ChatParams::new 签名: (model, turns, system_prompt) 替代旧 (..., messages, ...) 6. ToolResult.images: Vec<String> → Vec<ToolImageBlock>（带 SessionResource / Inline 变体）；turn_projection 保留 images；anthropic tool_result_to_json 按 image 数量 emit string or array content 7. hydrate_turn_images 新 helper：操作 Turn 的 ToolBatch items 而非 messages 8. llm.rs / turn_exec.rs: prepare_chat_params 不再接 messages 参数；hydrate 走 turn path 9. MCP sampling adapter: role+text history concat 成 single_user_prompt Turn 10. one-shot callers (classifier, hooks, smart_compact_llm, provider_resolver_impl, test harness) 改用 Turn::single_user_prompt 11. 删除 7 个针对 build_messages / finalize_messages 的 obsolete 测试，重写 read_image_e2e_test 走 turn path Issue 7 (normalize_messages) 不再是真问题：它在 OpenAI/Google bridge path 仍被使用，是 wire-format 公共 helper；保留在 wire 模块里合理。全部 88 测试 PASS；clippy + rustfmt clean。架构 audit Issue 10 闭环。

…ocs (Issues 12-14) Issue 12 — TurnTracker 字段封装： - 4 个字段 (current_turn_id / current_step_index / current_tool_batch_step / store) 改为私有；外部仅通过 reader (current_turn_id() / current_tool_batch_step() / store() / store_mut()) 访问 - mutator 内聚到 TurnTracker 自己的 try_start_turn / try_append_step / try_update_tool_state / try_end_turn / mark_tool_batch_open / close_tool_batch 方法 - 新增 TurnEventLogger trait —— TurnTracker 调用 logger.persist(event) 做 fail-closed 持久化，runtime 端 JsonlLogger 包装 SessionManager 写入 turns.jsonl - turn_record.rs 退化为 ALR 的薄 adapter，做 split-borrow 把 logger 和 &mut self.turns 解耦 - 调用方按 reader pattern 走 (self.turns.current_turn_id().is_some() 等) 收益：单一写者契约从「约定」升级为「类型 enforce」。新增 turn 字段不污染 ALR；mutator 散在 ALR 上的反模式消除。 Issue 13 — Injection 扁平化： - TurnStep::Injection(InjectedMessage) → TurnStep::Injection { kind, text } - 删除 InjectedMessage struct（双重命名，且名字含 "Message" 与 domain crate 不符） - callers (stop_feedback, governance/system_note, request_turns, turn_projection, test) 跟随 Issue 14 — turn.rs:99 doc 引用 MessageSource 删除（domain crate 文档不应耦合 protocol crate enum variant 名）。全部 88 测试 PASS；clippy + rustfmt clean。

The ContextStore dual-write is being migrated to a TurnStore-derived projection. This commit lays down the API + resume plumbing; the remaining push_X callsites stay as transitional dual-writes until TurnTrigger::UserInput is extended to carry image attachments (currently only carried by ContextStore via push_user in ingest, so naive auto-refresh would wipe images on the first projection). What this commit does: - `ContextStore::refresh_view(turns)` — new projection target; future callers will swap dual-writes for this single sync point. - `SessionManager::resume_session` now returns the recovered `Vec<Turn>` alongside the projected messages. Fixes a pre-existing bug where the resumed runner had an empty TurnStore (the next LLM call would have sent zero history because `prepare_chat_params` builds from TurnStore, not ContextStore). - `AgentLoopParams.initial_turns` + builder setter feed the recovered turns into `AgentLoopRunner::new`, which constructs the TurnStore via `TurnStore::from_turns` when seeded. - `TurnTracker::replace_store` swaps the inner store atomically and resyncs `current_turn_id` / `current_step_index` to the new state (used by `handle_resume_session`'s in-place session swap). - Test fixtures open a synthetic `TurnTrigger::Resume` turn so direct ALR-method tests (`record_assistant_message`, `execute_tools` …) have a turn to append steps to. `Resume` projects to no user message, so ContextStore stays empty just as before. - Marker comments at the six dual-write sites explain they are transitional and reference `ContextStore::refresh_view`'s doc. What's deferred to a follow-up commit: - Extend `TurnTrigger::UserInput` to carry image attachments. - Remove `push_user/push_assistant/push_tool_results` calls from the six runtime sites. - Add auto-refresh inside `turn_record` adapters so TurnStore writes automatically resync the projected view.

Removes the last "image attachments only live in ContextStore.messages" data-loss point in the Turn projection. After this commit, the runtime can derive ContextStore.messages entirely from TurnStore for the ingest path; only compaction's set_boundary anchor (which references ContextStore.id by uuid) keeps the remaining dual-write. Changes: - `TurnTrigger::UserInput` gains `images: Vec<ToolImageBlock>` with serde default + skip_if_empty so old `turns.jsonl` round-trips. - `envelope_to_trigger` converts `ImageAttachment → ToolImageBlock::Inline`. - `project_trigger` for UserInput emits a Text block + per-image Image block in the projected user Message. - `project_compaction_rehydrate` collapses N files into a single (assistant tool_use*, user tool_result*) pair — matches the pre-refactor wire shape and keeps message count stable. - `AgentLoopRunner::start_turn_record` promoted to `pub` so test fixtures and IPC layer can open synthetic turns without going through ingestion. Test-only ADT migration: every `TurnTrigger::UserInput { ... }` construction site adds `images: Vec::new()`. What's still deferred: - Compaction `set_boundary` boundary anchor uses `Message.id`; the projection emits id-less synthetic messages. Until `CompactionSummary` step carries the persisted summary id, `push_assistant/push_user` must remain on the compact_rehydrate + compaction_run paths. - Auto-refresh in `turn_record` adapters is deferred behind that ADT extension — without it, refresh would wipe the boundary-anchor msg.

Two related cleanups identified by arch_check round 3: ## Issue 17+21: TurnTracker state collapse Removes `current_turn_id` and `current_step_index` from TurnTracker — both were duplicates of state already authoritative in TurnStore. Before: TurnTracker maintained its own `current_turn_id` mirror that needed manual sync in `try_start_turn`, `try_end_turn`, `replace_store`. `current_step_index` was incremented manually but TurnStore.append_step already returns the assigned index. Easy to drift; every state-machine extension required maintaining two copies. After: TurnTracker holds only `store: TurnStore` and `current_tool_batch_step` (the one field that is genuinely tracker-specific — transient in-flight marker, never persisted). All `current_turn_id` reads go through `store.current_turn_id()`; step indices come from `store.append_step`'s return value. Side effect: fixes a latent bug where `try_start_turn` rollback popped the turn from `turns` but left `store.current_turn_id` pointing at the removed turn. The new `TurnStore::rollback_last_turn` clears both atomically. Same for `rollback_last_step` replacing the ad-hoc `turns_mut().pop()` pattern. ## Issue 20: drop dead LlmRequestSnapshot fields The struct's 4 fields (`model`/`max_tokens`/`tool_count`/`message_count`) were all write-only. Projection only reads `response`; nothing else references `request_snapshot`. `max_tokens` and `message_count` were even hardcoded to 0 at the construction site. Inlined `model: String` directly into `TurnStep::LlmCall`, deleted the `LlmRequestSnapshot` struct. Wire-format change to turns.jsonl; alpha stage permits hard cut.

Two fixes from arch_check round 4: ## Issue 25: rollback_last_turn requires expected id `TurnStore::rollback_last_turn` previously popped the trailing turn unconditionally. A mistaken caller (or future refactor) could call it in the wrong state and silently drop unrelated turn data. Now takes `&TurnId` and asserts the current turn matches — programmer errors panic fast in dev, no chance of silent data loss. `try_start_turn` passes the just-returned id so the contract holds. ## Issue 26: try_update_tool_state returns Result Previously had three early-return paths (NoCurrentTurn, NoToolBatchOpen, store failure) that only emitted `warn!`. Callers in `tools_finalize` / `tools_inject` had no way to observe whether the in-memory + persisted state actually changed. Introduced `TurnTrackerError` (manual Display/Error impl — no thiserror dep added) covering the three precondition failures plus `PersistFailed` for the post-mutation rollback case. `update_tool_batch_item_state` adapter logs at the boundary so callers still don't need to thread the error upward, but the failure is now structurally observable in tests and traces. ## Issue 29: re-evaluated as non-issue The auditor flagged `resume_session` returning `(Session, Vec<Turn>, Vec<Message>)` as redundant. Closer reading: legacy sessions written before turn-event dual-write have data in `messages.jsonl` that is NOT derivable from `turns.jsonl` (which is empty in that path). The triple return covers two distinct sources of truth, not one with a derived echo. Doc-comment updated to spell out the legacy-vs-new fork.

…ndling Audit round 5 (Issue 32): `try_append_step` returned `Option<u32>` while sibling `try_update_tool_state` already returned `Result`. All 6 callers (`llm_record`, `stop_feedback`, `compact_rehydrate`, `compaction_run`, `governance/system_note`, `tools.rs`) ignored the None silently — a persist failure or "no current turn" precondition violation would diverge the in-memory turn log from messages.jsonl without any structural signal. Changes: - `try_append_step` → `Result<u32, TurnTrackerError>` - `append_step_record` (adapter) → `Result<u32, TurnTrackerError>` - `start_tool_batch_record` → `Result<Option<u32>, TurnTrackerError>` (None preserves the existing "empty input is no-op" semantic) - All 6 callers log `Err` with context via `tracing` instead of silently dropping the failure Note on the dual-write order: The ContextStore `push_*` still happens even on append_step Err because compaction's `set_boundary` still anchors on `Message.id` (deferred Issue 19). Skipping the push would diverge ContextStore from messages.jsonl, which is currently the boundary anchor's source of truth. The error log surfaces the divergence so observers can see it; the structural fix waits on `CompactionSummary` carrying the persisted summary id. `tools.rs::execute_tools` likewise logs (rather than aborts) on `start_tool_batch_record` failure to keep the tool pipeline running for tests that bypass `ingest_message`. Subsequent `update_tool_batch_item_state` calls will themselves return `NoToolBatchOpen` and log — the cascade is observable, just no longer silent.

Audit round 6 findings: ## Issue 33: emit_all_interrupted / finalize_tool_results assume batch open Both call `update_tool_batch_item_state` in a loop without checking `current_tool_batch_step`. In the (rare) path where `start_tool_batch_record` failed earlier — logged but execution continued — each update would land in `try_update_tool_state`'s `NoToolBatchOpen` arm and warn-per-item with no useful effect. Wrap the loop in `current_tool_batch_step().is_some()` so the batch-open precondition is checked exactly once at the boundary instead of being repeatedly tested inside the tracker. ## Issue 34: rollback_last_step lacks the precondition guard rollback_last_turn already has `TurnStore::rollback_last_turn` was tightened in round 4 to take `&TurnId` and panic on mismatch — its sibling `rollback_last_step` was left as a silent no-op when `current_turn_id` is `None`. The sole caller in `try_append_step` always has a valid `turn_id` (since `append_step` just returned `Ok`), but a stray future caller in the wrong state would silently skip the rollback while the event log persist failure left the state diverged. Bring `rollback_last_step` up to the same panic-guarded contract: takes `&TurnId`, asserts current state matches before popping. The existing caller threads through the `turn_id` it already had.

…users, activity stamp) Four fixes from the max-recall code review pass: 1. **ingest.rs A2** — `start_turn_record` return was discarded. On `TurnStarted` persist failure the turn rolled back but ingest still wrote messages.jsonl and pushed to ContextStore, producing an orphan user message visible nowhere on resume. Now: bail early, emit an Error event, no dual-write happens. 2. **turn_store.rs A1/D1** — `end_current_turn` called `current_turn_id.take()` before `find()`. A `TurnNotFound` mid-method left `current_turn_id = None` while the InProgress turn remained in the vec — single-writer invariant broken. Reordered to locate-first, clear-after-success. 3. **anthropic/request_turns.rs B1** — `build_messages_json_from_turns` had no consecutive-same-role merge. A cancelled turn ending with a User (tool_result Cancelled) followed by a new UserInput turn produced adjacent User messages, which Anthropic 400s. Added `merge_adjacent_same_role` pass after the per-turn fold, mirroring what `normalize_messages` does for the OpenAI/Google adapters. Unit tests cover the merge + idempotent-on-alternating cases. 4. **store/mod.rs E1** — `refresh_view` previously did not restore `last_assistant_activity_at`, which would let microcompact skip stale tool-result scrubbing on resumed sessions. Now derives the stamp from the latest turn containing an LlmCall step (`Turn.started_at`) so the field tracks reality rather than wall clock — fixes the issue without using `SystemTime::now()` as a stand-in for "some time in the past". Code is SSOT: removed two paragraph-style doc comments that were explaining away architectural choices rather than driving them. Deferred: B2 (empty turns → empty messages on legacy resume) is blocked on the same Message.id anchor migration as the rest of the ContextStore retirement; D4 (assert! in rollback) is reachable only by single-thread programmer error, no observable trigger.

Pre-Turn sessions on disk have only messages.jsonl; resume_session was returning empty turns + legacy messages. The runtime seeds TurnStore from turns and ContextStore from messages — empty turns meant the LLM saw zero history on the first request after resume, even though messages.jsonl had the full prior conversation. From the user's POV this looked like "agent forgot everything after restart." Added `legacy_messages_to_turns` converter and wired it into the legacy branch of `resume_session`. The mapping: - User text msg → new Turn(UserInput, content, images) - User tool_result-only msg → ToolBatch step on current turn, items paired with prior LlmCall tool_uses by id - Assistant msg → LlmCall step on current turn, server tool pairs reconstructed from ServerToolUse + ServerToolResult - System msg → dropped (Anthropic / new model treat system_prompt separately anyway) - Orphan tool_result → Cancelled item with stub ToolCall (name="unknown") so downstream invariants still hold Lossy edges: - Original Turn timestamps are unrecoverable (uses Utc::now()) - LlmCall.model is empty (the original model wasn't persisted per-msg) - thinking signatures preserved when present, dropped when missing (matches `record_assistant_message`'s policy) Tests cover: empty input, single user msg, user+assistant pair, tool_use/tool_result round-trip with id pairing, orphan tool_result fallback. Also stripped a leftover paragraph-style doc comment on resume_session that explained legacy-vs-new branching — the code now is the doc.

…pat) Reverts the legacy_messages_to_turns converter (commit 20c7592). Per project policy: alpha-stage session jsonl format may be replaced directly with no migration tool / dual-rail / fallback. Old sessions created before the Turn refactor simply don't resume — that's intentional, not a bug to fix. Removed: - `crates/loopal-runtime/src/legacy_message_to_turn.rs` (324 LOC + tests) - `mod legacy_message_to_turn` from `lib.rs` - `messages.jsonl` fallback branch in `SessionManager::resume_session` - `messages.jsonl` fallback in `SessionManager::load_messages` - 6 tests in `session_manager_test.rs` that verified the old save_message → resume_session round-trip contract - 1 test in `session_test.rs` (test_save_message_and_resume) - `e2e_compact_resume_test.rs` (2 tests verifying message_store-based compact boundary marker survives resume — marker mechanism keyed on Message.id is obsolete once turns.jsonl is the only resume source) Kept: - The `merge_adjacent_same_role` unit tests in request_turns.rs (the multi-pair-across-role-transitions regression test added to verify the algorithm doesn't drop messages — algorithm proven correct) - `MessageStore::save_message` / `append_message` / `append_entry` — still called by ingest/llm_record etc. for the dual-write transitional period. They write but no longer affect resume.

Per "no backward compat" policy, finish what previous commits half-did. messages.jsonl is gone as a persistence target; turns.jsonl is the single source of truth. ContextStore.messages is a pure projected view of TurnStore, refreshed automatically by `turn_record` adapters. Removed (production code): - `ContextStore::push_user / push_assistant / push_tool_results` — dual-write mutators with no remaining callers - `ContextStore::set_boundary` + `sanitize_tool_pairs` plumbing — relied on Message.id anchor that messages.jsonl held; not needed once turns.jsonl is SSOT - `ContextStore::replace_messages` — last writer was set_boundary - `SessionManager::save_message / clear_history / mark_compact_boundary / rewind_to` — all write-only since resume no longer reads messages.jsonl - `SessionManager.message_store` field — no remaining users - `loopal-storage::messages` / `entry` / `replay` modules — entire legacy persistence path: `MessageStore`, `TaggedEntry`, `Marker`, `replay` function. Used to back the marker-based history-rewrite contract that the new model replaces with TurnStep events - `agent_loop::message_build` module — `build_user_message` was the ingest-side translator from Envelope to Message; ingest no longer builds a Message at all Auto-refresh wired in: - `turn_record.rs` adapters (start_turn_record, append_step_record, update_tool_batch_item_state, end_turn_record) now call `ContextStore::refresh_view` after every successful TurnStore write, so the projected view stays in lockstep with the authoritative log Operational behavior: - `ControlCommand::Clear` now clears both TurnStore and ContextStore (was: ContextStore + marker in messages.jsonl) - `ControlCommand::Rewind` now truncates TurnStore via new `TurnStore::truncate_turns(keep)`, refreshes the projected view, and emits Rewound. The `rewind` boundary-detection module is gone — turn index IS the boundary - Compaction's persistence path is simplified: `mark_compact_boundary` + `set_boundary` + `save_message(summary)` + `save_message(ack)` all gone. Only the `CompactionSummary` TurnStep is written. NB: this exposes a pre-existing architectural gap — wire-build does not yet honor the boundary, so compaction currently adds tokens instead of removing them. Tracked as separate work; the cleanup unblocks it Test fixtures: - All test runner fixtures (`make_runner`, `make_runner_with_channels`, `make_runner_with_mock_provider`, `make_multi_runner_with_intents`, `make_interactive_multi_runner`, `make_runner_with_intents` in try_recover, the harness `wire()` in loopal-test-support) now open a synthetic turn after construction. Empty seed → Resume turn (zero projection); single user seed → matching UserInput turn. Without this, `append_step_record` would hit `NoCurrentTurn` because tests bypass `ingest_message` Deleted obsolete tests: - `crates/loopal-context/tests/suite/store_test.rs` (push_X / set_boundary tests) - `crates/loopal-runtime/tests/suite/session_test::test_save_message_and_resume` and `session_manager_test`'s save_message / clear_history / mark_compact_boundary suite — they verified the now-removed message_store round-trip - 9 `crates/loopal-tui/tests/suite/e2e_compact_*` files — they all set up via `save_message` + `push_user` and asserted the old marker-based resume semantics - `crates/loopal-storage/tests/suite/{entry,messages,replay}_test.rs` — tests for deleted modules - 2 tests in `input_edge_test.rs` / `input_emit_fail_edge_test.rs` that pushed messages directly into ContextStore to exercise Clear / Compact handlers Followups (out of scope; tracked as known gaps): - Compaction at TurnStore level: TurnStep::CompactionSummary needs to also act as a boundary in wire-build (currently it just appends a summary while prior turns still flow to the LLM) - Microcompact still operates on ContextStore.messages in-place; that effect is wiped on the next refresh_view. Same architectural gap as compaction

…e-size discipline Architecture enforcement (8-phase plan): - Move TurnTracker into loopal-context so TurnStore mutators become pub(crate); runtime crates can no longer bypass the tracker to mutate turn state. - Retire ContextStore::from_messages wire-format entry; tests now seed history via loopal_test_support::seed_history. - Add tests/architecture_boundary_test.rs grepping for cross-layer wire-type leaks (MessageRole/ContentBlock outside provider/display/context layers). - Drop dead modules: provider-api Middleware trait, ContextStore pipeline / config_refresh / degradation / ingestion middleware, governance/compensation. - Re-derive open ToolBatch on TurnTracker::new / replace_store so resume-mid- batch routes update_tool_state correctly. - Wire-only mutations (microcompact, condense_server_blocks) flow through TurnTracker::with_wire_mut so SSOT contract is enforced for ephemeral paths. File-size + comment discipline: - Split 10 files exceeding 200 lines into directory modules: turn_event_store, turn_store, turn_tracker, turn_projection, request_turns, compaction, compact_rehydrate, ingestion, turn_degradation, resolver. - Extract sibling modules where structural seams existed: SettleSignal, mcp_settle, session_start_prompt. - Strip module-level //! blocks and /// docs that restate signatures (HARD RULE: code is SSOT; comments explain *why*, never *what*). - All src .rs files now ≤200 lines. Net: 137 files changed, +1581 / -4485 (-2904).

yishuiliunian added 30 commits May 24, 2026 01:04

yishuiliunian added 6 commits May 24, 2026 14:47

yishuiliunian merged commit 37bc534 into main May 26, 2026
4 checks passed

yishuiliunian deleted the feat/loopal-turn-pr1 branch May 26, 2026 13:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: enforce Turn-as-Unit SSOT + 200-line file discipline#183

refactor: enforce Turn-as-Unit SSOT + 200-line file discipline#183
yishuiliunian merged 36 commits into
mainfrom
feat/loopal-turn-pr1

yishuiliunian commented May 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

yishuiliunian commented May 26, 2026

Summary

Changes

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant