Skip to content

Improve calibration: AGI-conditional geography, expanded targets, pipeline fixes#671

Open
baogorek wants to merge 2 commits intomainfrom
fix/pipeline-resilience
Open

Improve calibration: AGI-conditional geography, expanded targets, pipeline fixes#671
baogorek wants to merge 2 commits intomainfrom
fix/pipeline-resilience

Conversation

@baogorek
Copy link
Copy Markdown
Collaborator

@baogorek baogorek commented Mar 31, 2026

Summary

  • AGI-conditional geographic assignment: Route top-10% AGI households to congressional districts proportional to CD AGI targets instead of uniform random. Prevents the optimizer from zeroing extreme-income records that land in low-AGI districts, which was destroying population targets.
  • Expanded calibration targets: Re-enable AGI (district/state/national), person_count by AGI, EITC, retirement contributions (401k, IRA, Roth), Social Security subcategories, and JCT tax expenditure targets (SALT, charitable, mortgage interest, medical, QBI). Remove stale "poisoned" comments.
  • Modal pipeline improvements: LPT (Longest Processing Time) scheduling for load-balanced worker partitioning; unified single-phase execution replaces separate state/district/city phases; state work items weighted by CD count.
  • NYC city dataset simplification: Direct county FIPS filtering replaces probabilistic get_county_filter_probability approach. Removes ~90 lines of dead code from block_assignment.py.
  • Matrix builder bug fixes: Count cache key now includes reform_id (was incorrectly sharing cached counts across reforms); reform household variable assembly no longer overwrites with zeros when reform data is missing for a state.
  • ETL guardrails: Post-insert verification that tax expenditure targets persist with correct reform_id; database validation rejects tax expenditure vars with reform_id=0 in root stratum.

Test plan

  • CI passes (unit tests, formatting)
  • make data completes without errors
  • Spot-check that AGI-conditional assignment correlates extreme-record count with CD AGI target
  • Verify expanded targets are picked up by the matrix builder (no "variable not found" warnings)

🤖 Generated with Claude Code

baogorek and others added 2 commits March 31, 2026 16:49
…rgets

- Switch loss_type from "relative" to "capped_relative" in L0 optimizer.
  Caps relative error at ±10 (max loss 100 per target) to prevent extreme
  PUF-inflated targets from hijacking gradients.
- Disable state_income_tax targets: ETL hardcodes $0 for WA and NH, but
  PolicyEngine computes non-zero tax (WA capital gains, NH interest/dividends).
  The $0 targets produced catastrophic loss crushing WA/NH weights to zero.
- Enable district AGI, AGI bins, and JCT reform targets in config.
- Result: 15,900 targets, median error 0.6% at epoch 500.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Route top-10% AGI households to congressional districts proportional
to CD AGI targets instead of uniform random assignment. This prevents
the optimizer from having to zero out extreme-income records in
low-AGI districts, which was destroying population targets.

Also reverts loss_type from "capped_relative" back to "relative"
to restore AGI gradient signal.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@baogorek baogorek force-pushed the fix/pipeline-resilience branch from d9f5d1c to 4d7c227 Compare March 31, 2026 20:50
@baogorek baogorek changed the title Fix calibration: AGI-conditional geography + relative loss Improve calibration: AGI-conditional geography, expanded targets, pipeline fixes Mar 31, 2026
@baogorek
Copy link
Copy Markdown
Collaborator Author

baogorek commented Apr 1, 2026

I was going to primarily request @juaristi22 for the review, but I'll add you, @MaxGhenis, since I see you've got PRs in progress and @anth-volk is FYI

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant