Test coverage gap: README claims vs actual verification

## Problem

The README makes claims that need verification through the docs-as-specs pipeline.

## Current State (Jan 2026)

After running `/validate-docs`, the coverage is better than expected:

| Category | Coverage |
|----------|----------|
| README features with specs | 6/7 (86%) |
| Spec assertions with tests | 34/35 (97%) |

## Critical Gap: Agent Workflow

The README prominently features this workflow:

```
┌─────────────┐     ┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│   Spawn     │────►│    Work     │────►│     PR      │────►│    Close    │
│   (main)    │     │  (k8s ns)   │     │  (GitHub)   │     │  (summary)  │
└─────────────┘     └─────────────┘     └─────────────┘     └─────────────┘
```

**This has NO spec and NO tests.** It's the headline feature but completely unverified.

## Other Gaps

- [ ] **chat.md**: "Messages persist across page reloads" - no test
- [ ] **Terminology**: Specs say "Sessions", some UI/tests say "INBOX"
- [ ] **Skipped tests**: Several spec behaviors have skipped tests due to flakiness

## Infrastructure Claims (Out of Scope for E2E)

These README claims are about infrastructure, not user behavior:

- "Workers continue after browser closes" - DBOS guarantee
- "Task state persists across restarts" - DBOS guarantee  
- "Workers in isolated K8s namespaces" - Deployment architecture

These should be verified via integration tests, not E2E.

## Action Items

1. [ ] Create `docs/specs/agent-workflow.md`
2. [ ] Add message persistence test to chat spec coverage
3. [ ] Fix skipped tests or remove spec assertions they cover
4. [ ] Standardize terminology (Sessions vs INBOX)

## What IS Working Well

The docs-as-specs approach is solid:
- `sessions.md` → 21/21 assertions tested (100%)
- `layout.md` → 7/7 assertions tested (100%)
- `chat.md` → 6/7 assertions tested (86%)

The pipeline works, we just need to extend it to the agent workflow.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test coverage gap: README claims vs actual verification #26

Problem

Current State (Jan 2026)

Critical Gap: Agent Workflow

Other Gaps

Infrastructure Claims (Out of Scope for E2E)

Action Items

What IS Working Well

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Category	Coverage
README features with specs	6/7 (86%)
Spec assertions with tests	34/35 (97%)

Test coverage gap: README claims vs actual verification #26

Description

Problem

Current State (Jan 2026)

Critical Gap: Agent Workflow

Other Gaps

Infrastructure Claims (Out of Scope for E2E)

Action Items

What IS Working Well

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions