Skip to content

refactor: enforce Turn-as-Unit SSOT + 200-line file discipline#183

Merged
yishuiliunian merged 36 commits into
mainfrom
feat/loopal-turn-pr1
May 26, 2026
Merged

refactor: enforce Turn-as-Unit SSOT + 200-line file discipline#183
yishuiliunian merged 36 commits into
mainfrom
feat/loopal-turn-pr1

Conversation

@yishuiliunian
Copy link
Copy Markdown
Contributor

Summary

  • Move TurnTracker into loopal-context so TurnStore mutators become pub(crate) — runtime cannot bypass the tracker to mutate turn state at compile time.
  • Retire ContextStore::from_messages wire-format entry; tests now seed history via loopal_test_support::seed_history. Adds tests/architecture_boundary_test.rs to grep cross-layer wire-type leaks.
  • Split 10 source files exceeding 200 lines into directory modules; strip //! / /// docs that restate signatures (HARD RULE: code is SSOT). All src/**/*.rs now ≤200 lines.

Changes

Architecture enforcement (8-phase plan):

  • loopal-context::turn_tracker now owns the only mutation path; TurnStore::{start_turn, append_step, ...} are pub(crate).
  • Wire-only mutations (microcompact, condense_server_blocks) flow through TurnTracker::with_wire_mut.
  • Re-derive open ToolBatch step on TurnTracker::new / replace_store so resume-mid-batch correctly routes update_tool_state.
  • Drop dead modules: provider-api Middleware trait, ContextStore pipeline/config_refresh/degradation/ingestion middleware, governance compensation.

File-size + comment cleanup:

  • Directory modules: turn_event_store, turn_store, turn_tracker, turn_projection, request_turns, compaction, compact_rehydrate, ingestion, turn_degradation, resolver.
  • Sibling extractions: SettleSignal, mcp_settle, session_start_prompt.

Net: 137 files, +1581 / -4485 (-2904).

Test plan

  • CI passes (bazel build / test / clippy / rustfmt)
  • New tests/architecture_boundary_test.rs enforces wire-type containment
  • tests/agent_loop/compaction_run_e2e_test.rs exercises microcompact + resume invariants

First step of the Turn-as-Unit refactor. Establishes the new domain entity
that will eventually replace the wire-format-leaked `loopal-message::Message`.

Motivation: see /Users/stone/.claude/plans/breezy-tickling-cerf.md
Provider Boundary 原则 — wire-format concepts (MessageRole, ContentBlock,
{role, content[]} structs) leaked across all layers (domain / IPC / storage /
view). This crate is the new domain entity that replaces Message.

Scope of PR-1:
* New crate `loopal-turn` (no consumers yet — independent introduction)
* ADT modules:
  - `content.rs` — TextBlock, ToolCall, ToolResult, ThinkingBlock,
    ServerToolPair (server tool call+result fused — Anthropic I5 by type)
  - `step.rs` — TurnStep enum (LlmCall / ToolBatch / Compaction / Injection),
    AssistantOutput (tool_calls Vec order = LLM stream order, encodes I4),
    OrderedToolBatch + ToolBatchItem (call+state fused — Anthropic I2 by type)
  - `turn.rs` — Turn { id, trigger, body, outcome }, TurnTrigger enum,
    TurnOutcome with InProgress/Complete/Idle/Error/Cancelled
  - `event.rs` — TurnEvent (TurnStarted/StepAppended/StepUpdated/TurnEnded)
    for event-sourced persistence (impl in PR-3)

Tests: serde round-trip, parallel tool order preservation, event variant
serialization. All pass; clippy + rustfmt clean.

Out of scope (later PRs):
- TurnRepo impl + jsonl persistence — PR-3 (storage rewrite)
- AgentLoopRunner integration — PR-4 (runtime switch)
- Provider build_body from Turn — PR-5a
- Protocol/IPC type rename (ProjectedMessage → ProjectedTurn) — PR-5b
- `loopal-message` crate deletion — PR-6
Trait surface: start_turn / append_step / update_tool_state / end_turn /
load_turns / snapshot_turn. Each operation also pushes a TurnEvent for
event-sourcing replay (used by jsonl persistence in later PR).

InMemoryTurnRepo provides:
- Thread-safe (Arc<RwLock<_>>) state
- TurnAlreadyEnded guard prevents writes after outcome set
- StepNotToolBatch guard for update_tool_state on wrong step kind

jsonl/file-backed impl deferred to PR-3 (loopal-storage rewrite).
6 unit tests cover lifecycle + error paths.
新增 TurnStore 作为 ContextStore 的 Turn-based 等价物:
- TurnStore: 内存中存储 Vec<Turn>,跟踪 in-progress turn
- 公共 API: start_turn / append_step / end_current_turn / from_turns
- 保留 budget + last_actual_input_tokens + last_assistant_activity_at 字段
- 与 ContextStore 并存,先建立平行结构,后续 PR 逐步迁移调用方

测试:7 个单元测试覆盖 lifecycle + crash recovery 路径
引入 turn_projection 模块作为 wire-format 投影的统一入口:
- project_turn_to_messages(&Turn) -> Vec<Message>
- project_turns_to_messages(&[Turn]) -> Vec<Message>

投影规则将 5 个 Anthropic API invariant 在类型层自然产生:
- I1 alternation: LlmCall 后必有 ToolBatch(如果有 tool_calls)
- I2 id pairing: ToolBatchItem 把 call+state 绑死
- I3 tool_result before text: ToolBatch 投影为独立 user message
- I4 parallel ordering: OrderedToolBatch.items Vec 顺序锁定 ToolCall 顺序
- I5 server pairing: ServerToolPair 投影为 ServerToolUse + ServerToolResult 配对

8 个单元测试覆盖:
- empty / user trigger / llm text / parallel tool ordering
- server pair / compaction / injection origin mapping / cancelled state

后续 PR-5a 此模块将被搬到 provider adapter 内部,作为
build_anthropic_messages_json / build_openai_messages_json 的雏形。
…3/7)

新增 TurnEventStore:以 event-sourcing 模式持久化 Turn,每行一个 TurnEvent。
- turns.jsonl 文件位于 sessions/<id>/turns.jsonl,与现有 messages.jsonl 并存
- append_event: 追加 TurnStarted / StepAppended / StepUpdated / TurnEnded
- load_turns: fold events → Vec<Turn>,缺失 TurnEnded 自动按 CrashRecovery
  Cancelled 收口;Pending/Running tool item 同步标 Cancelled,确保
  load 出的 Vec<Turn> 不会破坏后续 invariant

6 个测试覆盖:
- roundtrip append/load
- fold 单 turn / 多 turn 顺序
- step update 修补 tool state
- crash recovery (缺 TurnEnded)
- 文件缺失返回空

后续 PR-4 让 runtime 同时写 turns.jsonl + messages.jsonl 做平行验证;
PR-6 删除 MessageStore 和 messages.jsonl 完成 boundary 清理。
Runtime 现在同时维护 messages.jsonl (现有) 和 turns.jsonl (新增) 两份持久化:
- AgentLoopRunner 新增 current_turn_id + current_step_index 跟踪状态
- SessionManager 持有 TurnEventStore,新增 record_turn_event API
- turn_record.rs: start_turn_record / append_step_record / end_turn_record helpers
- turn_trigger_map.rs: Envelope → TurnTrigger 映射 (Human/Cron/System)

10 个 save_message 路径加入对应 turn event:
- ingest.rs: TurnStarted (来自 envelope 的 trigger)
- llm_record.rs: StepAppended(LlmCall) — thinking/text/tool_calls/server_pairs
- tools_finalize.rs: StepAppended(ToolBatch) — Done items
- tools_inject.rs/emit_all_interrupted: StepAppended(ToolBatch) — Cancelled items
- compaction_run.rs: StepAppended(Compaction) — summary + ack
- compact_rehydrate.rs: StepAppended(Compaction) — rehydrated 文件列表
- stop_feedback.rs: StepAppended(Injection { StopFeedback })
- governance/system_note.rs: StepAppended(Injection { Governance/SystemNote/... })

Turn boundary 由 transition() 显式管理:
- WaitingForInput/Finished 触发 TurnEnded(Complete)
- transition_error() 触发 TurnEnded(Error)
- 下一个 ingest 前若仍有 InProgress turn 则按 ParentTurnAborted Cancelled 收口

完整 //... build + test 全部 89 测试 PASS;clippy + rustfmt clean。
现阶段两份格式并存,PR-6 删除 MessageStore 后 turns.jsonl 成为唯一 SSOT。
…oads (PR-5b/7)

IPC schema 字段重命名修正 boundary leak:
- InboxEnqueued.message_id → envelope_id
- InboxConsumed.message_id → envelope_id
- UserMessageQueued.message_id → envelope_id

实际承载的就是 envelope.id;旧命名暗示 Message 是 first-class entity,
是 wire-format leak 到 IPC schema 的体现。重命名让 IPC 层用 envelope 概念,
消除"runtime 内部跟 protocol 跨进程契约说同一个词"的隐喻冲突。

涉及 15 个文件批量修正(pattern destructure + struct initializer +
test assertion 全部跟随)。Loopal 不是公开 SDK,IPC 是内部协议,
worktree 隔离 + 同 commit deploy 可以 hard cut。

全部 89 测试 PASS;clippy + rustfmt clean。
…e (PR-5b/7)

彻底解除 loopal-protocol 对 loopal-message 的依赖,修正 IPC schema 层的
wire-format leak。
- 删除 protocol/projection.rs,将 project_messages 搬到 loopal-context
  (新增 message_projection 模块) — implementation 层处理 message-aware 投影
- 删除 protocol/envelope.rs 中 From<&MessageSource> for MessageOrigin ——
  搬到 runtime/message_build.rs 作为 message_origin_for plain function
- 删除 4 个 protocol cross-crate projection 测试 (依赖 loopal-message)
- 更新 callers: bootstrap/attach_mode.rs, bootstrap/sub_agent_resume.rs,
  tui/resume_display.rs 从 loopal-context 导入 project_messages

PR-5b 验收 grep 测试全部通过:
- crates/loopal-protocol/Cargo.toml 不引用 loopal-message ✓
- crates/loopal-protocol/BUILD.bazel 不依赖 //crates/loopal-message ✓
- crates/loopal-protocol/src/ 不 import MessageRole / ContentBlock ✓

注意 ProjectedMessage / ProjectedToolCall 类型保留在 loopal-protocol
作为 IPC schema —— 它们只是 String/JSON shape,没有 wire-format 依赖。
真正的 message-shaped → turn-shaped 重命名 (ProjectedMessage →
ProjectedTurn) 与 TUI hydration 路径切换合并到后续 PR。

全部 89 测试 PASS;clippy + rustfmt clean。
…a/7)

新增 ChatParams.turns: Vec<Turn> 作为 domain-shaped 输入字段(messages
保留向后兼容)。Anthropic adapter 内部新增 build_messages_json_from_turns
函数,直接 fold Turns → Anthropic wire JSON,跳过 normalize/finalize/
sanitize 中间步骤——5 个 invariant 由 Turn 类型保证。

- ChatParams 加 turns 字段;所有 14 处字面初始化跟随补 turns: vec![]
- AgentLoopRunner 持有 in-memory TurnStore (loopal-context),turn_record
  helpers 同步推送到 turn_store
- llm_params.rs: prepare_chat_params 把 turn_store.turns() 灌进 ChatParams
- anthropic/send.rs: build_request_body if turns 非空走 turn-based path
  else fall back to messages path(smart_compact_llm 单 message 一次性调用
  保持兼容)
- openai/google: 暂时通过 project_turns_to_messages 把 turns 还原为 messages
  喂给现有 build_input/build_contents;后续 PR 各自实现 turn-based 直接 fold

全部 89 测试 PASS;clippy + rustfmt clean。Anthropic 是 hot path,
turn-based 直接生效;OpenAI/Google 走桥接 path 保证一致性。
新增 boundary job 在 CI 中强制检查 PR-5b 已经清理干净的 boundary:
- crates/loopal-protocol/BUILD.bazel 不依赖 loopal-message
- crates/loopal-protocol/src 不 import loopal_message
- protocol src 不引用 MessageRole / ContentBlock(McpContentBlock 例外)
- event_payload 的 InboxEnqueued/InboxConsumed/UserMessageQueued
  variant 不再用 message_id 字段

后续 PR 在 boundary 验收 grep test 满足后逐步添加更严格的 check
(domain layer / provider crate 边界)。这层 CI 防回退保证 PR-5b
的 IPC boundary 修正在 main 上不被悄悄打破。
resume_session 现在优先读 turns.jsonl 还原 Vec<Turn>,再投影成 Vec<Message>
喂给现有 caller。messages.jsonl 仅在 turns.jsonl 缺失时作为 legacy fallback。
sub-agent load_messages 同样语义。

至此 storage layer 的 SSOT 正式切换到 turns.jsonl:
- 写入:runtime 同时写两份(dual-write 自 PR-4 起)
- 读取:resume 优先 turns,messages 仅 legacy fallback
- 投影:runtime 内通过 loopal-context::project_turns_to_messages 还原
  message-shape view 给暂时未迁移的 caller

后续 PR 等 caller 全部迁移到消费 Vec<Turn> 后,messages.jsonl 写入可
彻底删除,message_store 退役。全部 89 测试 PASS;clippy + rustfmt clean。
最终消除 loopal-message crate 作为 cross-cutting wire-format dependency 的存在:
- Message / MessageRole / ContentBlock / ImageSource / normalize_messages
  移到 loopal-provider-api/src/wire/(provider-shared schema layer)
- MessageOrigin 移到 loopal-turn/src/origin.rs(domain audit metadata)
- 删除 crates/loopal-message/ 整个目录
- 所有 12 个依赖该 crate 的 BUILD.bazel 改成依赖 loopal-provider-api 或 loopal-turn
- 全部 70+ 文件中 `use loopal_message::X` 替换为 `use loopal_provider_api::X`
  (wire 类型)或 `use loopal_turn::MessageOrigin`(audit)

PR-6 acceptance grep tests 验证:
- Test 1: `ls crates/ | grep '^loopal-message$'` → 空 ✓
- Test 3: `loopal-protocol/{Cargo.toml,BUILD.bazel,src/}` 不引用 loopal-message ✓
- Test 4: 整个 codebase 无 `loopal-message` BUILD dep 或 `use loopal_message::` ✓
  (仅 envelope.rs / origin.rs 注释提到历史名字,非实际引用)

Test 2 (domain layer 不引用 MessageRole/ContentBlock 等 type 名字) 的精神
已经达到 - 类型现在位于 loopal-provider-api(schema crate),不再有独立的
loopal-message wire-format crate; 这是 plan PR-6 的 "boundary 修正"。

剩余文字 grep(如 MessageRole 在 domain layer crates)属于 implementation
层的 message-shape view 还存活;这层会随后续 hydrate 重构(Turn 模型加
Image 类型 + ContextStore → TurnStore 替换)逐步退役。当前 dual 存在:
turns.jsonl 是写入 SSOT(PR-4),resume 也优先读 turns(PR-6 earlier)。

全部 88 测试 PASS(之前 89,减少的 1 个是 loopal-message 自己的测试目标
随 crate 删除消失);clippy + rustfmt clean。
新增 CI grep check 防止任何形式的 loopal-message 回退:
- crates/loopal-message 目录不能存在
- 任何 src 文件不能有 `use loopal_message::`
- 任何 BUILD.bazel 不能依赖 `//crates/loopal-message`

PR-6 acceptance test 1 (crate deleted) 在 CI 中强制 enforce。配合 PR-5b
的 protocol boundary check + envelope_id check,整体 wire-format boundary
不可回退。
之前的 CI boundary job 包含多个 build 已经能抓的 check:
- 检查 crates/loopal-message 目录不存在 → crate 不存在则 import 立刻
  编译失败,bazel 检测不到 target 也立刻挂
- 检查源码没有 use loopal_message:: → 同上
- 检查 BUILD.bazel 不依赖 //crates/loopal-message → 同上

这些 grep 在 build 失败之外的能见度是零。删掉。

CI boundary 现在只保留 build 抓不到、grep 能抓的两类违规:
1. protocol/src 不能 pub use 或 type alias 来自他处的
   MessageRole / ContentBlock(compile 通过,但 boundary leak 复活)
2. event_payload 字段名不能从 envelope_id 改回 message_id
   (rename 是合法 Rust,但破坏 IPC schema cleanup)
之前加的 boundary job 用 shell grep 表达架构约束,跟 CI 其他 step 全
bazel 的风格不符;正则脆弱(空格/换行/同名变量都可能误判),本地跑
不到只能 push 上 CI 试。

架构 boundary 真正的执行点是 crate 边界 + pub 可见性 + code review;
CI 的角色是 build / test / lint。如果以后想固化某个 boundary,应该
写成 bazel rust_test target(本地可跑、参与 fail-fast、可以单测重现),
不是 YAML 里跑 grep。
修正反向依赖:loopal-provider 此前依赖 loopal-context 仅为复用
project_turns_to_messages(OpenAI/Google bridge)— 但 provider 是基础
设施层,不应被 context (domain/middleware) 反向依赖。

把 project_turn_to_messages / project_turns_to_messages 从
loopal-context::turn_projection 搬到 loopal-provider-api::wire::turn_projection
(schema crate,Turn 和 Message 都在那),让所有 caller 从 provider-api
导入:
- loopal-provider/src/{openai,google}/mod.rs: 改 loopal_provider_api::
- loopal-runtime/src/session.rs: 同上
- 删除 loopal-provider/BUILD.bazel 中 //crates/loopal-context 依赖
- 把 turn_projection_test.rs 一并搬迁到 provider-api/tests

依赖方向恢复:loopal-provider → loopal-provider-api → loopal-turn ✓
(之前 provider → context → provider-api → turn 反向了)

全部 88 测试 PASS;clippy + rustfmt clean。
…e 5)

修正"用字段空值表达 tagged union"反模式。

之前:
  TurnStep::Compaction(CompactionRecord {
      summary_text, ack_text,         // compaction_run emit 时填这两个
      rehydrated: Vec<RehydratedFile>, // compact_rehydrate emit 时填这一个
      ...
  })
两种语义靠字段空值区分(summary_text 空 ↔ rehydrate only),破坏类型自描述。

之后:
  TurnStep::CompactionSummary(CompactionSummary { summary_text, ack_text, ... })
  TurnStep::CompactionRehydrate(CompactionRehydrate { files })

每个 variant 表达单一语义。Anthropic build_request_body 和 turn_projection
也按 variant 分支 handle,逻辑更直观。下游 fold/replay 时不再需要 "字段空就跳过"
判断。

全部 88 测试 PASS;clippy + rustfmt clean。
之前 turn_projection 处理 TurnTrigger 时:
- Cron 投影但丢前缀(原 ingest 是 `[scheduled] {text}`)
- Agent / Channel 走 UserInput 分支,结构化 address 信息全丢
- GoalContinuation / BackgroundHook 投影成 None — 这俩本身是 ephemeral_in_history
  但仍要进入 LLM 上下文,原来 ingest 写过 user message。这里 None 等于 resume
  之后整个 hook/goal-continuation 触发的轮次从 LLM 上下文里消失,行为回归。

修正:
1. TurnTrigger 加 Agent { from, content } / Channel { channel, from, content }
   两个 variant 保留结构化路由信息
2. Cron / GoalContinuation / BackgroundHook 字段统一为 { envelope_id, content }
3. turn_projection 与 anthropic/request_turns 投影时按 build_user_message 的
   前缀规则: `[scheduled]`, `[from: addr]`, `[from: #chan/from]`
4. GoalContinuation/BackgroundHook 投影出可见 user message(不再 None)
5. turn_trigger_map (envelope → trigger) 跟随调整
6. 5 个 round-trip 测试覆盖每种 trigger 的投影前缀和 origin

全部 88 测试 PASS;clippy + rustfmt clean。
之前 tools_finalize 把每个 ToolResult block 反向 fold 成 ToolBatch step,
ToolCall 的 name/input 用空字符串 / Null 占位 — 违反 ToolBatchItem 设计
本意(call+result 类型层绑死锁 I2 invariant)。turns.jsonl 里 ToolBatch
的 call 信息事实上是垃圾。

新设计 (event-sourcing 正用):
1. execute_tools 入口 emit 一个 ToolBatch step,所有 items 是 Pending 但
   carry 完整 ToolCall (name + input from LLM stream)。step_index 缓存在
   AgentLoopRunner.current_tool_batch_step。
2. tools_finalize 不再 emit 新 ToolBatch step。每个 ToolResult block 用
   tool_use_id 映射回 batch.items[i] 的位置,发 StepUpdated 把 state
   patch 成 Done。
3. emit_all_interrupted 同样发 StepUpdated 把每个 item 标 Cancelled
   (UserInterrupt)。
4. close_tool_batch_record 清空 in-flight 索引。
5. TurnStore 增加 update_tool_state;TurnStoreError 增加 3 个
   index-out-of-range / step-mismatch variant。
6. turn_record 新增 start/update/close_tool_batch_record helpers。

效果:turns.jsonl 里 ToolBatchItem.call 是真实的 (name + input 完整);
I2/I4 invariant 在事件序列上也实质成立 (StepUpdated 只能 patch state,
call 字段不可变)。

全部 88 测试 PASS;clippy + rustfmt clean。
…ue 2)

之前 turn_record helpers 把 turn_store (in-memory) 和 turns.jsonl 写入
独立处理,任一失败只 warn! 不回滚 → 两份视图悄悄漂移:
- fold(turns.jsonl) 重建出的 Vec<Turn> 与 ALR.turn_store 不一致
- 进而 LLM 上下文(来自 turn_store 投影)跟 resume 后看到的不一致

新 fail-closed pattern (governance/system_note.rs 已有先例的扩展):

  in-memory 先变更 → 立即持久化事件 → 失败则回滚 in-memory

具体改动:
- start_turn_record: 持久化失败时 pop in-memory turn;返回 Option<TurnId>
- append_step_record: 持久化失败时 pop in-memory step;只在成功时递增
  current_step_index(让 step_index 与 jsonl 完美对齐)
- update_tool_batch_item_state: 先 snapshot 旧 state,持久化失败时 best-
  effort 回滚到旧 state
- end_turn_record: 持久化失败时不回滚 (shutdown 路径太侵入),依赖 resume
  的 CrashRecovery 语义把缺 TurnEnded 的 turn 收口为 Cancelled,最终
  两侧仍收敛
- 新增 persist_event() helper 单点封装 record_turn_event + warn 日志

收益:turn_store 与 turns.jsonl 任意时刻可被 fold() 验证一致;resume
路径的 SSOT 假设("jsonl 是真相")真正成立。

全部 88 测试 PASS;clippy + rustfmt clean。
MessageOrigin 是 wire-format Message 的 audit metadata(标记消息怎么进入
对话)。之前住在 loopal-turn (domain crate),名字暗示 Message 是 domain
一等公民。

搬到 loopal-provider-api::wire::origin,跟 Message / ContentBlock 同居
schema crate;loopal-turn 现在不再 export 任何 message-shape 类型,名
副其实是 pure domain。

调用方 (~12 文件) 改 use loopal_turn::MessageOrigin → loopal_provider_api。

全部 88 测试 PASS;clippy + rustfmt clean。
…sue 8)

之前两个 projection 模块都叫 "projection" 但层级不同:
- loopal-provider-api::wire::turn_projection: Turn (domain) → Message (wire)
- loopal-context::message_projection: Message (wire) → ProjectedMessage (display)

turn_projection 已在 wire/ 目录下 + 函数名 project_turns_to_messages,方向
自带。message_projection 名字没说方向,且 project_messages 是 generic 名。

重命名让方向显式:
- 文件 message_projection.rs → display_projection.rs
- 函数 project_messages → project_messages_to_display

调用方 (4 文件: tui/resume_display, bootstrap/attach_mode, bootstrap/
sub_agent_resume, lib re-export) 跟随。

全部 88 测试 PASS;clippy + rustfmt clean。
…Issue 9)

AgentLoopRunner 之前 17 个字段,turn-related 占了 4 个 (current_turn_id /
current_step_index / current_tool_batch_step / turn_store)。任何新加 turn
相关字段会继续污染 ALR struct。

提取 TurnTracker (loopal-runtime/src/agent_loop/turn_tracker.rs) 把 4 个
字段合并为 1 个 ALR.turns 字段。TurnTracker 是纯数据 + 一个 reset 辅助
方法 (无业务逻辑;turn_record helpers 仍是 ALR impl,通过 self.turns.X
访问)。

调用方按 (self.current_X → self.turns.current_X) 和 (self.turn_store →
self.turns.store) 机械替换,~27 处 touch。

ALR 现在 14 字段,turn 状态集中在一个语义清晰的 sub-struct 里;下次加
parent_turn_id (sub-agent) 之类的字段只动 TurnTracker。

全部 88 测试 PASS;clippy + rustfmt clean。
… (Issue 10)

最大的一个清理。架构 audit Issue 10 指出 ChatParams 同时有 messages +
turns 是双 SSOT,违反 single-writer 原则;同时 Provider trait 的
finalize_messages 和大量 message-shape 内部建模都是 message-fallback path
的残留。

改动核心:
1. ChatParams.messages 字段删除;turns 是唯一对话历史输入
2. Provider::finalize_messages trait 方法删除(user-tail 逻辑搬进
   anthropic build_messages_json_from_turns,按 model.supports_prefill
   + continuation_intent 决策)
3. anthropic/finalize.rs 文件删除,anthropic/request.rs 仅保留 build_tools
4. OpenAI / Google / OpenAI-compat 内部 project_turns_to_messages → 走
   现有 build_input / build_contents / build_messages;无 ChatParams
   双 clone,单次 projection
5. ChatParams::new 签名: (model, turns, system_prompt) 替代旧 (..., messages, ...)
6. ToolResult.images: Vec<String> → Vec<ToolImageBlock>(带 SessionResource /
   Inline 变体);turn_projection 保留 images;anthropic tool_result_to_json
   按 image 数量 emit string or array content
7. hydrate_turn_images 新 helper:操作 Turn 的 ToolBatch items 而非 messages
8. llm.rs / turn_exec.rs: prepare_chat_params 不再接 messages 参数;hydrate
   走 turn path
9. MCP sampling adapter: role+text history concat 成 single_user_prompt Turn
10. one-shot callers (classifier, hooks, smart_compact_llm, provider_resolver_impl,
    test harness) 改用 Turn::single_user_prompt
11. 删除 7 个针对 build_messages / finalize_messages 的 obsolete 测试,
    重写 read_image_e2e_test 走 turn path

Issue 7 (normalize_messages) 不再是真问题:它在 OpenAI/Google bridge
path 仍被使用,是 wire-format 公共 helper;保留在 wire 模块里合理。

全部 88 测试 PASS;clippy + rustfmt clean。架构 audit Issue 10 闭环。
…ocs (Issues 12-14)

Issue 12 — TurnTracker 字段封装:
- 4 个字段 (current_turn_id / current_step_index / current_tool_batch_step
  / store) 改为私有;外部仅通过 reader (current_turn_id() /
  current_tool_batch_step() / store() / store_mut()) 访问
- mutator 内聚到 TurnTracker 自己的 try_start_turn / try_append_step /
  try_update_tool_state / try_end_turn / mark_tool_batch_open /
  close_tool_batch 方法
- 新增 TurnEventLogger trait —— TurnTracker 调用 logger.persist(event)
  做 fail-closed 持久化,runtime 端 JsonlLogger 包装 SessionManager
  写入 turns.jsonl
- turn_record.rs 退化为 ALR 的薄 adapter,做 split-borrow 把 logger 和
  &mut self.turns 解耦
- 调用方按 reader pattern 走 (self.turns.current_turn_id().is_some() 等)

收益:单一写者契约从「约定」升级为「类型 enforce」。新增 turn 字段
不污染 ALR;mutator 散在 ALR 上的反模式消除。

Issue 13 — Injection 扁平化:
- TurnStep::Injection(InjectedMessage) → TurnStep::Injection { kind, text }
- 删除 InjectedMessage struct(双重命名,且名字含 "Message" 与 domain
  crate 不符)
- callers (stop_feedback, governance/system_note, request_turns,
  turn_projection, test) 跟随

Issue 14 — turn.rs:99 doc 引用 MessageSource 删除(domain crate 文档不应
耦合 protocol crate enum variant 名)。

全部 88 测试 PASS;clippy + rustfmt clean。
The ContextStore dual-write is being migrated to a TurnStore-derived
projection. This commit lays down the API + resume plumbing; the
remaining push_X callsites stay as transitional dual-writes until
TurnTrigger::UserInput is extended to carry image attachments
(currently only carried by ContextStore via push_user in ingest, so
naive auto-refresh would wipe images on the first projection).

What this commit does:
- `ContextStore::refresh_view(turns)` — new projection target; future
  callers will swap dual-writes for this single sync point.
- `SessionManager::resume_session` now returns the recovered `Vec<Turn>`
  alongside the projected messages. Fixes a pre-existing bug where
  the resumed runner had an empty TurnStore (the next LLM call would
  have sent zero history because `prepare_chat_params` builds from
  TurnStore, not ContextStore).
- `AgentLoopParams.initial_turns` + builder setter feed the recovered
  turns into `AgentLoopRunner::new`, which constructs the TurnStore
  via `TurnStore::from_turns` when seeded.
- `TurnTracker::replace_store` swaps the inner store atomically and
  resyncs `current_turn_id` / `current_step_index` to the new state
  (used by `handle_resume_session`'s in-place session swap).
- Test fixtures open a synthetic `TurnTrigger::Resume` turn so direct
  ALR-method tests (`record_assistant_message`, `execute_tools` …)
  have a turn to append steps to. `Resume` projects to no user
  message, so ContextStore stays empty just as before.
- Marker comments at the six dual-write sites explain they are
  transitional and reference `ContextStore::refresh_view`'s doc.

What's deferred to a follow-up commit:
- Extend `TurnTrigger::UserInput` to carry image attachments.
- Remove `push_user/push_assistant/push_tool_results` calls from the
  six runtime sites.
- Add auto-refresh inside `turn_record` adapters so TurnStore writes
  automatically resync the projected view.
Removes the last "image attachments only live in ContextStore.messages"
data-loss point in the Turn projection. After this commit, the runtime
can derive ContextStore.messages entirely from TurnStore for the
ingest path; only compaction's set_boundary anchor (which references
ContextStore.id by uuid) keeps the remaining dual-write.

Changes:
- `TurnTrigger::UserInput` gains `images: Vec<ToolImageBlock>` with
  serde default + skip_if_empty so old `turns.jsonl` round-trips.
- `envelope_to_trigger` converts `ImageAttachment → ToolImageBlock::Inline`.
- `project_trigger` for UserInput emits a Text block + per-image Image
  block in the projected user Message.
- `project_compaction_rehydrate` collapses N files into a single
  (assistant tool_use*, user tool_result*) pair — matches the
  pre-refactor wire shape and keeps message count stable.
- `AgentLoopRunner::start_turn_record` promoted to `pub` so test
  fixtures and IPC layer can open synthetic turns without going
  through ingestion.

Test-only ADT migration: every `TurnTrigger::UserInput { ... }`
construction site adds `images: Vec::new()`.

What's still deferred:
- Compaction `set_boundary` boundary anchor uses `Message.id`; the
  projection emits id-less synthetic messages. Until `CompactionSummary`
  step carries the persisted summary id, `push_assistant/push_user`
  must remain on the compact_rehydrate + compaction_run paths.
- Auto-refresh in `turn_record` adapters is deferred behind that ADT
  extension — without it, refresh would wipe the boundary-anchor msg.
Two related cleanups identified by arch_check round 3:

## Issue 17+21: TurnTracker state collapse
Removes `current_turn_id` and `current_step_index` from TurnTracker —
both were duplicates of state already authoritative in TurnStore.

Before: TurnTracker maintained its own `current_turn_id` mirror that
needed manual sync in `try_start_turn`, `try_end_turn`, `replace_store`.
`current_step_index` was incremented manually but TurnStore.append_step
already returns the assigned index. Easy to drift; every state-machine
extension required maintaining two copies.

After: TurnTracker holds only `store: TurnStore` and `current_tool_batch_step`
(the one field that is genuinely tracker-specific — transient in-flight
marker, never persisted). All `current_turn_id` reads go through
`store.current_turn_id()`; step indices come from `store.append_step`'s
return value.

Side effect: fixes a latent bug where `try_start_turn` rollback popped
the turn from `turns` but left `store.current_turn_id` pointing at the
removed turn. The new `TurnStore::rollback_last_turn` clears both
atomically. Same for `rollback_last_step` replacing the ad-hoc
`turns_mut().pop()` pattern.

## Issue 20: drop dead LlmRequestSnapshot fields
The struct's 4 fields (`model`/`max_tokens`/`tool_count`/`message_count`)
were all write-only. Projection only reads `response`; nothing else
references `request_snapshot`. `max_tokens` and `message_count` were
even hardcoded to 0 at the construction site.

Inlined `model: String` directly into `TurnStep::LlmCall`, deleted the
`LlmRequestSnapshot` struct. Wire-format change to turns.jsonl; alpha
stage permits hard cut.
Two fixes from arch_check round 4:

## Issue 25: rollback_last_turn requires expected id
`TurnStore::rollback_last_turn` previously popped the trailing turn
unconditionally. A mistaken caller (or future refactor) could call it
in the wrong state and silently drop unrelated turn data.

Now takes `&TurnId` and asserts the current turn matches — programmer
errors panic fast in dev, no chance of silent data loss. `try_start_turn`
passes the just-returned id so the contract holds.

## Issue 26: try_update_tool_state returns Result
Previously had three early-return paths (NoCurrentTurn, NoToolBatchOpen,
store failure) that only emitted `warn!`. Callers in `tools_finalize` /
`tools_inject` had no way to observe whether the in-memory + persisted
state actually changed.

Introduced `TurnTrackerError` (manual Display/Error impl — no thiserror
dep added) covering the three precondition failures plus `PersistFailed`
for the post-mutation rollback case. `update_tool_batch_item_state`
adapter logs at the boundary so callers still don't need to thread the
error upward, but the failure is now structurally observable in tests
and traces.

## Issue 29: re-evaluated as non-issue
The auditor flagged `resume_session` returning `(Session, Vec<Turn>,
Vec<Message>)` as redundant. Closer reading: legacy sessions written
before turn-event dual-write have data in `messages.jsonl` that is NOT
derivable from `turns.jsonl` (which is empty in that path). The triple
return covers two distinct sources of truth, not one with a derived
echo. Doc-comment updated to spell out the legacy-vs-new fork.
…ndling

Audit round 5 (Issue 32): `try_append_step` returned `Option<u32>`
while sibling `try_update_tool_state` already returned `Result`.
All 6 callers (`llm_record`, `stop_feedback`, `compact_rehydrate`,
`compaction_run`, `governance/system_note`, `tools.rs`) ignored the
None silently — a persist failure or "no current turn" precondition
violation would diverge the in-memory turn log from messages.jsonl
without any structural signal.

Changes:
- `try_append_step` → `Result<u32, TurnTrackerError>`
- `append_step_record` (adapter) → `Result<u32, TurnTrackerError>`
- `start_tool_batch_record` → `Result<Option<u32>, TurnTrackerError>`
  (None preserves the existing "empty input is no-op" semantic)
- All 6 callers log `Err` with context via `tracing` instead of
  silently dropping the failure

Note on the dual-write order:
The ContextStore `push_*` still happens even on append_step Err
because compaction's `set_boundary` still anchors on `Message.id`
(deferred Issue 19). Skipping the push would diverge ContextStore
from messages.jsonl, which is currently the boundary anchor's source
of truth. The error log surfaces the divergence so observers can see
it; the structural fix waits on `CompactionSummary` carrying the
persisted summary id.

`tools.rs::execute_tools` likewise logs (rather than aborts) on
`start_tool_batch_record` failure to keep the tool pipeline running
for tests that bypass `ingest_message`. Subsequent
`update_tool_batch_item_state` calls will themselves return
`NoToolBatchOpen` and log — the cascade is observable, just no longer
silent.
Audit round 6 findings:

## Issue 33: emit_all_interrupted / finalize_tool_results assume batch open
Both call `update_tool_batch_item_state` in a loop without checking
`current_tool_batch_step`. In the (rare) path where `start_tool_batch_record`
failed earlier — logged but execution continued — each update would land in
`try_update_tool_state`'s `NoToolBatchOpen` arm and warn-per-item with
no useful effect. Wrap the loop in `current_tool_batch_step().is_some()`
so the batch-open precondition is checked exactly once at the boundary
instead of being repeatedly tested inside the tracker.

## Issue 34: rollback_last_step lacks the precondition guard rollback_last_turn already has
`TurnStore::rollback_last_turn` was tightened in round 4 to take
`&TurnId` and panic on mismatch — its sibling `rollback_last_step` was
left as a silent no-op when `current_turn_id` is `None`. The sole caller
in `try_append_step` always has a valid `turn_id` (since `append_step`
just returned `Ok`), but a stray future caller in the wrong state would
silently skip the rollback while the event log persist failure left the
state diverged.

Bring `rollback_last_step` up to the same panic-guarded contract:
takes `&TurnId`, asserts current state matches before popping. The
existing caller threads through the `turn_id` it already had.
…users, activity stamp)

Four fixes from the max-recall code review pass:

1. **ingest.rs A2** — `start_turn_record` return was discarded. On
   `TurnStarted` persist failure the turn rolled back but ingest still
   wrote messages.jsonl and pushed to ContextStore, producing an orphan
   user message visible nowhere on resume. Now: bail early, emit an
   Error event, no dual-write happens.

2. **turn_store.rs A1/D1** — `end_current_turn` called
   `current_turn_id.take()` before `find()`. A `TurnNotFound` mid-method
   left `current_turn_id = None` while the InProgress turn remained in
   the vec — single-writer invariant broken. Reordered to locate-first,
   clear-after-success.

3. **anthropic/request_turns.rs B1** — `build_messages_json_from_turns`
   had no consecutive-same-role merge. A cancelled turn ending with
   a User (tool_result Cancelled) followed by a new UserInput turn
   produced adjacent User messages, which Anthropic 400s. Added
   `merge_adjacent_same_role` pass after the per-turn fold, mirroring
   what `normalize_messages` does for the OpenAI/Google adapters. Unit
   tests cover the merge + idempotent-on-alternating cases.

4. **store/mod.rs E1** — `refresh_view` previously did not restore
   `last_assistant_activity_at`, which would let microcompact skip
   stale tool-result scrubbing on resumed sessions. Now derives the
   stamp from the latest turn containing an LlmCall step
   (`Turn.started_at`) so the field tracks reality rather than wall
   clock — fixes the issue without using `SystemTime::now()` as a
   stand-in for "some time in the past".

Code is SSOT: removed two paragraph-style doc comments that were
explaining away architectural choices rather than driving them.

Deferred: B2 (empty turns → empty messages on legacy resume) is blocked
on the same Message.id anchor migration as the rest of the ContextStore
retirement; D4 (assert! in rollback) is reachable only by single-thread
programmer error, no observable trigger.
Pre-Turn sessions on disk have only messages.jsonl; resume_session was
returning empty turns + legacy messages. The runtime seeds TurnStore
from turns and ContextStore from messages — empty turns meant the LLM
saw zero history on the first request after resume, even though
messages.jsonl had the full prior conversation. From the user's POV
this looked like "agent forgot everything after restart."

Added `legacy_messages_to_turns` converter and wired it into the
legacy branch of `resume_session`. The mapping:

- User text msg                → new Turn(UserInput, content, images)
- User tool_result-only msg    → ToolBatch step on current turn,
                                 items paired with prior LlmCall
                                 tool_uses by id
- Assistant msg                → LlmCall step on current turn,
                                 server tool pairs reconstructed
                                 from ServerToolUse + ServerToolResult
- System msg                   → dropped (Anthropic / new model treat
                                 system_prompt separately anyway)
- Orphan tool_result            → Cancelled item with stub ToolCall
                                  (name="unknown") so downstream
                                  invariants still hold

Lossy edges:
- Original Turn timestamps are unrecoverable (uses Utc::now())
- LlmCall.model is empty (the original model wasn't persisted per-msg)
- thinking signatures preserved when present, dropped when missing
  (matches `record_assistant_message`'s policy)

Tests cover: empty input, single user msg, user+assistant pair,
tool_use/tool_result round-trip with id pairing, orphan tool_result
fallback.

Also stripped a leftover paragraph-style doc comment on resume_session
that explained legacy-vs-new branching — the code now is the doc.
…pat)

Reverts the legacy_messages_to_turns converter (commit 20c7592). Per
project policy: alpha-stage session jsonl format may be replaced
directly with no migration tool / dual-rail / fallback. Old sessions
created before the Turn refactor simply don't resume — that's
intentional, not a bug to fix.

Removed:
- `crates/loopal-runtime/src/legacy_message_to_turn.rs` (324 LOC + tests)
- `mod legacy_message_to_turn` from `lib.rs`
- `messages.jsonl` fallback branch in `SessionManager::resume_session`
- `messages.jsonl` fallback in `SessionManager::load_messages`
- 6 tests in `session_manager_test.rs` that verified the old
  save_message → resume_session round-trip contract
- 1 test in `session_test.rs` (test_save_message_and_resume)
- `e2e_compact_resume_test.rs` (2 tests verifying message_store-based
  compact boundary marker survives resume — marker mechanism keyed
  on Message.id is obsolete once turns.jsonl is the only resume source)

Kept:
- The `merge_adjacent_same_role` unit tests in request_turns.rs (the
  multi-pair-across-role-transitions regression test added to verify
  the algorithm doesn't drop messages — algorithm proven correct)
- `MessageStore::save_message` / `append_message` / `append_entry` —
  still called by ingest/llm_record etc. for the dual-write
  transitional period. They write but no longer affect resume.
Per "no backward compat" policy, finish what previous commits half-did.
messages.jsonl is gone as a persistence target; turns.jsonl is the
single source of truth. ContextStore.messages is a pure projected view
of TurnStore, refreshed automatically by `turn_record` adapters.

Removed (production code):
- `ContextStore::push_user / push_assistant / push_tool_results` — dual-write
  mutators with no remaining callers
- `ContextStore::set_boundary` + `sanitize_tool_pairs` plumbing — relied
  on Message.id anchor that messages.jsonl held; not needed once
  turns.jsonl is SSOT
- `ContextStore::replace_messages` — last writer was set_boundary
- `SessionManager::save_message / clear_history / mark_compact_boundary
  / rewind_to` — all write-only since resume no longer reads
  messages.jsonl
- `SessionManager.message_store` field — no remaining users
- `loopal-storage::messages` / `entry` / `replay` modules — entire
  legacy persistence path: `MessageStore`, `TaggedEntry`, `Marker`,
  `replay` function. Used to back the marker-based history-rewrite
  contract that the new model replaces with TurnStep events
- `agent_loop::message_build` module — `build_user_message` was the
  ingest-side translator from Envelope to Message; ingest no longer
  builds a Message at all

Auto-refresh wired in:
- `turn_record.rs` adapters (start_turn_record, append_step_record,
  update_tool_batch_item_state, end_turn_record) now call
  `ContextStore::refresh_view` after every successful TurnStore write,
  so the projected view stays in lockstep with the authoritative log

Operational behavior:
- `ControlCommand::Clear` now clears both TurnStore and ContextStore
  (was: ContextStore + marker in messages.jsonl)
- `ControlCommand::Rewind` now truncates TurnStore via new
  `TurnStore::truncate_turns(keep)`, refreshes the projected view, and
  emits Rewound. The `rewind` boundary-detection module is gone — turn
  index IS the boundary
- Compaction's persistence path is simplified: `mark_compact_boundary`
  + `set_boundary` + `save_message(summary)` + `save_message(ack)` all
  gone. Only the `CompactionSummary` TurnStep is written. NB: this
  exposes a pre-existing architectural gap — wire-build does not yet
  honor the boundary, so compaction currently adds tokens instead of
  removing them. Tracked as separate work; the cleanup unblocks it

Test fixtures:
- All test runner fixtures (`make_runner`, `make_runner_with_channels`,
  `make_runner_with_mock_provider`, `make_multi_runner_with_intents`,
  `make_interactive_multi_runner`, `make_runner_with_intents` in
  try_recover, the harness `wire()` in loopal-test-support) now open a
  synthetic turn after construction. Empty seed → Resume turn (zero
  projection); single user seed → matching UserInput turn. Without
  this, `append_step_record` would hit `NoCurrentTurn` because tests
  bypass `ingest_message`

Deleted obsolete tests:
- `crates/loopal-context/tests/suite/store_test.rs` (push_X /
  set_boundary tests)
- `crates/loopal-runtime/tests/suite/session_test::test_save_message_and_resume`
  and `session_manager_test`'s save_message / clear_history /
  mark_compact_boundary suite — they verified the now-removed
  message_store round-trip
- 9 `crates/loopal-tui/tests/suite/e2e_compact_*` files — they all set
  up via `save_message` + `push_user` and asserted the old marker-based
  resume semantics
- `crates/loopal-storage/tests/suite/{entry,messages,replay}_test.rs`
  — tests for deleted modules
- 2 tests in `input_edge_test.rs` /
  `input_emit_fail_edge_test.rs` that pushed messages directly into
  ContextStore to exercise Clear / Compact handlers

Followups (out of scope; tracked as known gaps):
- Compaction at TurnStore level: TurnStep::CompactionSummary needs to
  also act as a boundary in wire-build (currently it just appends a
  summary while prior turns still flow to the LLM)
- Microcompact still operates on ContextStore.messages in-place; that
  effect is wiped on the next refresh_view. Same architectural gap as
  compaction
…e-size discipline

Architecture enforcement (8-phase plan):
- Move TurnTracker into loopal-context so TurnStore mutators become pub(crate);
  runtime crates can no longer bypass the tracker to mutate turn state.
- Retire ContextStore::from_messages wire-format entry; tests now seed history
  via loopal_test_support::seed_history.
- Add tests/architecture_boundary_test.rs grepping for cross-layer wire-type
  leaks (MessageRole/ContentBlock outside provider/display/context layers).
- Drop dead modules: provider-api Middleware trait, ContextStore pipeline /
  config_refresh / degradation / ingestion middleware, governance/compensation.
- Re-derive open ToolBatch on TurnTracker::new / replace_store so resume-mid-
  batch routes update_tool_state correctly.
- Wire-only mutations (microcompact, condense_server_blocks) flow through
  TurnTracker::with_wire_mut so SSOT contract is enforced for ephemeral paths.

File-size + comment discipline:
- Split 10 files exceeding 200 lines into directory modules: turn_event_store,
  turn_store, turn_tracker, turn_projection, request_turns, compaction,
  compact_rehydrate, ingestion, turn_degradation, resolver.
- Extract sibling modules where structural seams existed: SettleSignal,
  mcp_settle, session_start_prompt.
- Strip module-level //! blocks and /// docs that restate signatures (HARD
  RULE: code is SSOT; comments explain *why*, never *what*).
- All src .rs files now ≤200 lines.

Net: 137 files changed, +1581 / -4485 (-2904).
@yishuiliunian yishuiliunian merged commit 37bc534 into main May 26, 2026
4 checks passed
@yishuiliunian yishuiliunian deleted the feat/loopal-turn-pr1 branch May 26, 2026 13:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant