Problem
NYC.h5 is currently built using "congressional district stacking" — filtering to 13 CDs that overlap NYC, then probabilistically scaling weights by P(NYC county | CD). This approach is deprecated because:
- CDs are redrawn every decade — the hardcoded
NYC_CDS list of 13 CDs is fragile
- NYC is not a collection of CDs — CDs straddle NYC boundaries, requiring probabilistic weight scaling
- We now have census blocks —
GeographyAssignment.county_fips gives us 5-digit county FIPS derived from block_geoid[:5]
Solution
Replace CD stacking with a direct county FIPS filter:
NYC_COUNTY_FIPS = {"36005", "36047", "36061", "36081", "36085"}
Each clone IS or IS NOT in NYC based on its assigned block's county — no probabilistic scaling needed. This is simpler, more correct, and doesn't depend on congressional district boundaries.
Changes
- Add
county_fips_filter parameter to build_h5() that zeros out weights for clones outside target counties
- Update
build_cities() to use county_fips_filter=NYC_COUNTY_FIPS instead of cd_subset + county_filter
- Remove
NYC_COUNTIES (enum name set) and NYC_CDS (13 hardcoded CD codes)
- Remove now-unused
get_county_filter_probability() and get_filtered_block_distribution() from block_assignment.py
- Update
modal_app/worker_script.py accordingly
Problem
NYC.h5 is currently built using "congressional district stacking" — filtering to 13 CDs that overlap NYC, then probabilistically scaling weights by P(NYC county | CD). This approach is deprecated because:
NYC_CDSlist of 13 CDs is fragileGeographyAssignment.county_fipsgives us 5-digit county FIPS derived fromblock_geoid[:5]Solution
Replace CD stacking with a direct county FIPS filter:
Each clone IS or IS NOT in NYC based on its assigned block's county — no probabilistic scaling needed. This is simpler, more correct, and doesn't depend on congressional district boundaries.
Changes
county_fips_filterparameter tobuild_h5()that zeros out weights for clones outside target countiesbuild_cities()to usecounty_fips_filter=NYC_COUNTY_FIPSinstead ofcd_subset+county_filterNYC_COUNTIES(enum name set) andNYC_CDS(13 hardcoded CD codes)get_county_filter_probability()andget_filtered_block_distribution()fromblock_assignment.pymodal_app/worker_script.pyaccordingly