Problem
The CI build (build.yml) averages ~14.5 minutes wall time. All steps run sequentially in a single job despite many being independent. No artifact caching is used beyond mise tool binaries.
Measured step timings (5 recent successful runs)
| Step |
Run 1 |
Run 2 |
Run 3 |
Run 4 |
Run 5 |
Avg |
| Free Disk Space |
66s |
79s |
23s |
35s |
35s |
48s |
| Install mise |
3s |
13s |
8s |
3s |
3s |
6s |
| Setup Node.js |
4s |
4s |
5s |
4s |
4s |
4s |
| Install dependencies |
77s |
78s |
66s |
76s |
76s |
75s |
| build |
635s |
761s |
751s |
738s |
738s |
725s |
| Upload artifact |
15s |
15s |
15s |
14s |
14s |
15s |
The build step (avg 725s / 12min) is the overwhelming bottleneck.
What runs inside mise run build (all sequential)
1. //agent:quality (~30-40s: ruff lint, ruff format, ty typecheck, pytest)
2. //cdk:build (~500-600s total)
├── :compile (~30-40s: tsc --build)
├── :test (~90-370s: jest, 99 suites, 1789 tests)
├── :eslint (~30-60s: eslint --fix)
└── :synth:quiet (~120-180s: cdk synth with esbuild bundling per Lambda)
3. //cli:build (~20-30s: compile + test + eslint)
4. //docs:build (~30-60s: sync-starlight + astro build)
5. //docs:sync (~5s: DUPLICATE — already runs as dep of //docs:build)
Dependency graph (what actually depends on what)
┌── agent:quality ──────────────┐
├── cdk:eslint ─────────────────┤
install ────┼── cdk:test ───────────────────┼── (all pass) ── upload
├── cli:build ──────────────────┤
├── docs:build ─────────────────┤
└── cdk:compile → cdk:synth ────┘
Only cdk:synth depends on cdk:compile. Everything else is independent.
Optimization priorities (ranked by impact)
P0: Cache node_modules and .venv (~60-70s saved)
- uses: actions/cache@v4
with:
path: |
node_modules
agent/.venv
key: deps-${{ runner.os }}-${{ hashFiles('yarn.lock', 'agent/uv.lock') }}
restore-keys: deps-${{ runner.os }}-
Turns 75s cold install into ~5s cache restore. First run after lockfile change is still cold.
P1: Parallelize independent jobs (~5-7min wall time saved)
Split the single build job into parallel GHA jobs:
jobs:
install:
# checkout + install + cache
agent-quality:
needs: install
# ruff, ty, pytest
cdk-compile-synth:
needs: install
# tsc → cdk synth → upload artifact
cdk-test:
needs: install
# jest (the heaviest step)
cdk-eslint:
needs: install
# eslint
cli-build:
needs: install
# compile + test + eslint
docs-build:
needs: install
# sync + astro build
Critical path drops from ~14.5min to: install (5s cached) → cdk:compile (40s) → cdk:synth (150s) → upload (15s) = ~3.5min
P2: Cache Jest transform output (~30-60s saved on test step)
Add cacheDirectory to jest config:
"cacheDirectory": "<rootDir>/.jest-cache"
Then in CI:
- uses: actions/cache@v4
with:
path: cdk/.jest-cache
key: jest-${{ runner.os }}-${{ hashFiles('cdk/yarn.lock') }}-${{ github.sha }}
restore-keys: |
jest-${{ runner.os }}-${{ hashFiles('cdk/yarn.lock') }}-
jest-${{ runner.os }}-
Cross-branch reuse works because Jest keys by file content hash — unchanged files hit cache regardless of branch.
P3: Cache TypeScript incremental build (~10-20s saved)
- uses: actions/cache@v4
with:
path: |
cdk/tsconfig.tsbuildinfo
cli/tsconfig.tsbuildinfo
key: tsc-${{ runner.os }}-${{ hashFiles('cdk/src/**', 'cli/src/**') }}
restore-keys: tsc-${{ runner.os }}-
P4: Remove duplicate //docs:sync (~5s saved, trivial)
mise.toml root tasks.build calls //docs:sync explicitly after //docs:build, but //docs:build already depends on :sync. Remove the duplicate.
P5 (future): Jest sharding for test parallelism
Only needed if tests remain the critical path after P0-P2. With our beforeAll optimization (PR #195) tests are already ~90s. Sharding would bring that to ~30-40s but adds matrix complexity.
Projected improvement
| Scenario |
Current |
Cache only (P0+P2+P3) |
Full parallel + cache (P0-P4) |
| CI wall time |
~14.5min |
~10-11min |
~4-5min |
| Critical path |
14.5min (serial) |
10-11min (serial) |
compile→synth→upload (~3.5min) |
| Billed minutes |
~14.5 |
~10-11 |
~20 (more jobs but each shorter) |
Acceptance criteria
Notes
- P1 (parallelism) requires careful handling of the
self_mutation / patch detection — currently it runs git diff --staged at the end of the single job. With multiple jobs, each job could mutate files independently (e.g., eslint --fix, docs sync). Need a final "check mutations" job that either re-checks or collects patches.
- The
compute_type matrix currently only has [agentcore]. If more types are added later, each gets its own parallel run — this architecture scales well.
Free Disk Space step (avg 48s) is required because CDK synth + Docker image bundling can exhaust the default runner disk. With parallelism, only the cdk-compile-synth job needs this step.
Problem
The CI build (
build.yml) averages ~14.5 minutes wall time. All steps run sequentially in a single job despite many being independent. No artifact caching is used beyond mise tool binaries.Measured step timings (5 recent successful runs)
The
buildstep (avg 725s / 12min) is the overwhelming bottleneck.What runs inside
mise run build(all sequential)Dependency graph (what actually depends on what)
Only
cdk:synthdepends oncdk:compile. Everything else is independent.Optimization priorities (ranked by impact)
P0: Cache
node_modulesand.venv(~60-70s saved)Turns 75s cold install into ~5s cache restore. First run after lockfile change is still cold.
P1: Parallelize independent jobs (~5-7min wall time saved)
Split the single
buildjob into parallel GHA jobs:Critical path drops from ~14.5min to:
install (5s cached) → cdk:compile (40s) → cdk:synth (150s) → upload (15s)= ~3.5minP2: Cache Jest transform output (~30-60s saved on test step)
Add
cacheDirectoryto jest config:Then in CI:
Cross-branch reuse works because Jest keys by file content hash — unchanged files hit cache regardless of branch.
P3: Cache TypeScript incremental build (~10-20s saved)
P4: Remove duplicate
//docs:sync(~5s saved, trivial)mise.tomlroottasks.buildcalls//docs:syncexplicitly after//docs:build, but//docs:buildalready depends on:sync. Remove the duplicate.P5 (future): Jest sharding for test parallelism
Only needed if tests remain the critical path after P0-P2. With our
beforeAlloptimization (PR #195) tests are already ~90s. Sharding would bring that to ~30-40s but adds matrix complexity.Projected improvement
Acceptance criteria
node_modules+.venvcached viaactions/cacheneeds:graphcacheDirectoryset + cached in CI.tsbuildinfocached//docs:syncremovedcdk.out/artifact upload still produces correct outputNotes
self_mutation/ patch detection — currently it runsgit diff --stagedat the end of the single job. With multiple jobs, each job could mutate files independently (e.g., eslint--fix, docs sync). Need a final "check mutations" job that either re-checks or collects patches.compute_typematrix currently only has[agentcore]. If more types are added later, each gets its own parallel run — this architecture scales well.Free Disk Spacestep (avg 48s) is required because CDK synth + Docker image bundling can exhaust the default runner disk. With parallelism, only thecdk-compile-synthjob needs this step.