Integration-Automation · JE-Chen · May 21, 2026 · May 20, 2026 · May 20, 2026 · May 20, 2026
diff --git a/.claude/agents/deck-design.md b/.claude/agents/deck-design.md
diff --git a/.claude/agents/language-vocabulary-check.md b/.claude/agents/language-vocabulary-check.md
diff --git a/.claude/agents/paper-summary-author.md b/.claude/agents/paper-summary-author.md
@@ -130,13 +130,67 @@ For each paper that is on-topic for the user's actual intent (see "Off-topic pap
    - `core_observation` — single most important takeaway, gets its own slide
    - `limitations` — author-acknowledged limits
    - `future_work` — author-stated future work
+   - **`figures` — MANDATORY when the paper has any figure.** A thesis-style deck without figures is half the deliverable. See the "Figure extraction" step below.
 
    Always set provenance fields:
    ```python
    model="<your model id> (LLM-as-agent, read N-page PDF)"
    raw_text_chars=<extracted length>
    ```
 
+   ### Figure extraction (mandatory before authoring `figures=`)
+
+   The `figures` field expects `(caption, image_path, description bullets)` tuples
+   pointing at PNGs already on disk. Render them BEFORE the regen script runs:
+
+   ```python
+   from autopapertoppt.intelligence.pdf_assets import extract_figures
+   figures = extract_figures(
+       Path("exports/<run>/pdfs/<key>.pdf"),
+       Path("exports/<run>/figures/<key>/"),
+   )
+   ```
+
+   PyMuPDF (`fitz`) is a default install dependency — no extra extras needed.
+   Each extracted figure is named `p{NN}-{idx}-{caption-slug}.png` so the regen
+   script can reference it stably via a small helper:
+
+   ```python
+   _FIGURES_ROOT = ROOT / "exports" / _RUN_DIR_NAME / "figures"
+   def _fig(paper_key, filename):
+       return str(_FIGURES_ROOT / paper_key / filename)
+   ```
+
+   **Curate the output** — `extract_figures` is greedy (renders every figure-
+   sized region of every page). Inspect the PNGs and **include every figure
+   that meaningfully advances the paper's story**, not just 2-3 token ones.
+   A thesis-style deck has room for the full visual narrative:
+   - Motivation chart (the wall / gap / scaling problem)
+   - Background diagram (architecture / pipeline context)
+   - System overview / workflow (almost always Fig 1 or 2 of the paper)
+   - Worked example / illustrative diagram
+   - Key technique diagram (verification, attention, etc.)
+   - Headline result chart
+   - Ablation / parameter sweep
+   - Per-device or per-task result chart
+   - Optional: timeline / taxonomy / qualitative example
+
+   Skip noise — placeholder logo regions, tiny header strips, low-resolution
+   thumbnails, exact duplicates that appear twice in the paper. **Quantity
+   alone isn't quality; relevance is.**
+
+   When the curated figure count plus the rich-tier body content will exceed
+   the default 25-slide cap (`ExportOptions.max_slides_per_paper`), set the
+   cap to `0` in your regen script's `ExportOptions(...)` call so the cap is
+   disabled — figures are part of the deliverable, not optional polish.
+   `scripts/regen_speculative_decoding_zh_tw.py` does this (Xu's EdgeLLM
+   deck ends up at 27 slides with 8 curated figures).
+
+   Worked example: `scripts/_extract_speculative_figures.py` extracts every
+   figure from 4 PDFs into `exports/speculative-decoding-zh-tw/figures/<key>/`;
+   `scripts/regen_speculative_decoding_zh_tw.py::_fig()` references the
+   curated subset. Use this as the template.
+
 3. **Copy URL / DOI / arxiv_id VERBATIM from the search xlsx — never from memory.** Publisher URL paths cannot be guessed:
    - AAAI uses numeric IDs like `v40i5.37389`, not author slugs
    - IEEE uses an opaque `arnumber`
@@ -252,6 +306,7 @@ When the user says "search X and make a [lang] PPT", run the runbook below strai
 - Do NOT add `-rich` to filenames. Overwrite the lightweight emit at the canonical `<key>.pptx`.
 - Do NOT exceed 4 entries in `contributions_detailed`. The slide overshoots the footer guard above that.
 - Do NOT add `--lightweight` or `--no-pdf` to the CLI invocation "for speed" when the user asked for a deck. Those flags produce a non-deliverable. See "Default CLI invocation" above.
+- Do NOT omit `figures=` from a rich `PaperSummary` when the paper has any figure. A thesis-style deck without the paper's system diagram or key chart is half a deliverable. See "Figure extraction" under the per-paper procedure.
 - Do NOT leave irrelevant downloads in the run directory. The search engine is keyword-based, so off-topic papers will slip in. Once you classify a paper as off-topic, delete its `exports/<run>/pdfs/<key>.pdf` and `exports/<run>/<key>.pptx`. Keep the aggregate xlsx / bib intact — they are the **honest record** of what the search returned. See "Pruning irrelevant downloads" below.
 
 ## Pruning irrelevant downloads (mandatory before handing the deck back)

diff --git a/.claude/agents/slide-deck-rules.md b/.claude/agents/slide-deck-rules.md
@@ -6,6 +6,15 @@ tools: Read, Grep, Glob
 
 You are the slide-deck rules reference for AutoPaperToPPT. When invoked, return the relevant rule(s) for the change being made and flag any direct violations you can spot in the diff. The actual overflow inspection lives in the sibling `slide-overflow-check` subagent — don't re-implement it here.
 
+**Scope split** — this agent owns *geometry* and *content safety*
+(slide dimensions, footer guard, truncation caps, per-slide content
+caps, semantic shape names, i18n keys, rendering-tier dispatch). The
+sibling `deck-design` subagent owns *visual identity* (typography per
+language, brand palette, accent geometry, "looks AI-generated"
+anti-patterns). Both apply to any change to
+`autopapertoppt/exporters/pptx.py` — consult the appropriate one for
+the concern at hand.
+
 ## Slide Deck Rules
 
 The pptx exporter is the most visually-sensitive surface in the project. Several non-obvious rules keep its output safe for a thesis-defence audience.

diff --git a/CLAUDE.md b/CLAUDE.md
@@ -4,8 +4,9 @@
 > recent Aider, and several other tools auto-load `AGENTS.md`; keep them in
 > sync when you change rules. Detailed rules now live in `.claude/agents/`
 > as subagents (`code-quality-reviewer`, `compliance-auditor`,
-> `slide-deck-rules`, `env-vars`, plus the task-running agents `dod-verify`,
-> `paper-summary-author`, `post-author-audit`, `slide-overflow-check`).
+> `slide-deck-rules`, `deck-design`, `env-vars`, `language-vocabulary-check`,
+> plus the task-running agents `dod-verify`, `paper-summary-author`,
+> `post-author-audit`, `slide-overflow-check`).
 
 ## Project Overview
 
@@ -106,15 +107,65 @@ window open during an IEEE / Scholar / paywalled-PDF step, the path is broken
 — surface it, don't trust the results. Full rule + audit checklist:
 `compliance-auditor` subagent.
 
+## Dark-Mode Contract: Every Text Run Sets an Explicit Colour (HARD RULE)
+
+Dark mode is the project's default pptx render path. The post-build
+recolour pass swaps light-palette RGB values to their dark-palette
+equivalents — but it can only swap colours it can read. **A text run
+with `run.font.color.rgb = None` inherits the slide-master's theme
+colour, renders as near-black on the dark slide background, and is
+invisible.** Every text-adding helper in `autopapertoppt/exporters/pptx.py`
+MUST therefore assign `run.font.color.rgb = _BRAND_*` (one of the four
+palette constants) after creating or overwriting a run. Never leave the
+colour at its default; never pass `colour=None` to `_add_textbox`;
+never write `RGBColor(0, 0, 0)` — use `_BRAND_DARK` instead.
+
+The `_swap_text_colors` pass in the dark-mode post-build now also
+promotes any leftover `rgb is None` or `(0, 0, 0)` runs to `#E5E7EB`
+near-white as a second layer of defence. The regression test
+`tests/test_exporters.py::test_pptx_dark_mode_has_no_invisible_runs`
+walks every run on every slide and fails if any non-empty run lacks an
+explicit non-black colour. Full rule + the audit script + the
+two-layer defence rationale live in `.claude/agents/deck-design.md`
+"Dark-mode contract".
+
+**Mirror rule — light-on-light contrast.** Any new light-fill RGB
+introduced in `pptx.py` (e.g. a callout / KPI / RQ-box background)
+MUST also have an entry in `_LIGHT_TO_DARK_FILL`; otherwise the fill
+stays near-white in dark mode while its text gets re-coloured to
+near-white → invisible. Regression test
+`test_pptx_dark_mode_no_light_text_on_light_fill` walks every shape
+and fails when both fill and text luminance are > 0.7 of 255 in a
+default-dark-mode render.
+
+**No red text.** ``_BRAND_ACCENT`` (= ``#C0392B`` warm red) is BANNED
+as a TEXT colour across both light and dark modes. Red text reads
+as error / warning in slide conventions and pattern-matches strongly
+to AI-generated KPI emphasis. The sanctioned text-emphasis colour is
+**``_BRAND_HIGHLIGHT``** (teal-700, ``#0E7490``) — pair with
+``run.font.bold = True``. Use ``_BRAND_GREY`` for caption / placeholder /
+chrome text so headlines stay headlines. Variety rule: KPI value + RQ
+question use teal; figure caption + figure-unavailable use grey — do
+not collapse all four to the same colour. The dark-mode pass swaps
+teal-700 → teal-400 (``#2DD4BF``) via ``_LIGHT_TO_DARK_TEXT``; the
+audit script's ``_ACCEPTED_DARK_RUN_COLORS`` set knows about both.
+Regression test ``test_pptx_no_red_text_runs`` walks every run on a
+default-rendered deck and fails if any run uses ``#C0392B``. The red
+constant stays in the palette in case a future non-text accent shape
+(sparkline, status badge) wants it. Full rule + per-call-site palette
+mapping in ``.claude/agents/deck-design.md`` "No red text contract (HARD)".
+
 ## Where the detailed rules live
 
 | Topic | Subagent (in `.claude/agents/`) |
 |---|---|
 | Design patterns, SOLID, performance, async, unit tests, full linter rule set | `code-quality-reviewer` |
 | Core-vs-source-plugin boundary, network safety, browser-automation hard rule, path safety, suppression conventions, bandit skip config | `compliance-auditor` |
 | pptx exporter geometry, rendering tiers, truncation caps, semantic shape names, i18n, LLM-as-agent vs Python pipeline | `slide-deck-rules` |
+| pptx visual identity (typography per language, brand palette, accent geometry, master-slide expectations, "looks AI-generated" anti-patterns) | `deck-design` |
 | Env vars + Python / `.venv` toolchain reference | `env-vars` |
 | Definition-of-Done gate runner | `dod-verify` |
 | LLM-as-agent thesis-style authoring (PDF → rich PaperSummary) | `paper-summary-author` |
 | URL-fabrication / off-topic audits after authoring | `post-author-audit` |
 | Slide-overflow regression check | `slide-overflow-check` |
+| Language-correct vocabulary (no S-Chinese loan words in zh-tw, no T-Chinese in zh-cn, etc.) | `language-vocabulary-check` |
diff --git a/README.md b/README.md
@@ -294,6 +294,7 @@ py -m autopapertoppt --paper "https://arxiv.org/abs/1706.03762" `
 | `--paywall-threshold` | Fraction of paywalled results that triggers the confirmation prompt. Default 0.30. |
 | `--yes` | Skip the paywall prompt and proceed. |
 | `--max-slides` | Per-paper slide cap (default 25; pass 0 for unlimited). |
+| `--light-mode` | Render the pptx with a white background + navy text. Default is dark mode (dark background + near-white text) — pass this for projectors in well-lit rooms or when the deck will be printed. |
 | `--quiet` | Suppress per-paper printout. |
 
 ### Environment variables

diff --git a/autopapertoppt/cli.py b/autopapertoppt/cli.py
@@ -236,6 +236,17 @@ def build_parser() -> argparse.ArgumentParser:
             "Default: claude-opus-4-7 (or AUTOPAPERTOPPT_LLM_MODEL)."
         ),
     )
+    parser.add_argument(
+        "--light-mode",
+        action="store_true",
+        help=(
+            "Render the pptx with the classic white slide background + "
+            "navy text. Default is dark mode (dark slide background, "
+            "near-white text) — pass this flag for projectors in "
+            "well-lit rooms or when the deck will be printed / read on "
+            "paper."
+        ),
+    )
     parser.add_argument(
         "--no-pdf",
         dest="download_pdf",
@@ -358,6 +369,7 @@ async def _run(args: argparse.Namespace) -> int:
         include_abstract=not args.no_abstract,
         language=args.lang,
         max_slides_per_paper=args.max_slides,
+        dark_mode=not args.light_mode,
     )
     needs_pptx = EXPORT_PPTX in formats
     # ``--pdf`` already supplies the PDF — the paywall gate is irrelevant

diff --git a/autopapertoppt/core/models.py b/autopapertoppt/core/models.py
@@ -391,6 +391,13 @@ class ExportOptions:
     #: render the full deck regardless of size; ``None`` is treated
     #: identically to the default.
     max_slides_per_paper: int | None = 25
+    #: When True (default), the pptx exporter applies a dark-mode
+    #: palette post-build: dark slide background, light text, dark
+    #: table-row stripe. Set False (or pass ``--light-mode`` on the
+    #: CLI / tick the "Light mode" box in the GUI Deck tab) to keep
+    #: the classic white-background light deck — useful on projectors
+    #: in well-lit rooms or when the audience reads on paper after.
+    dark_mode: bool = True
 
     def __post_init__(self) -> None:
         if not self.formats: