Skip to content

[TEST] Push workflow dry run - DO NOT MERGE#668

Draft
anth-volk wants to merge 17 commits intomainfrom
test/push-workflow-dry-run
Draft

[TEST] Push workflow dry run - DO NOT MERGE#668
anth-volk wants to merge 17 commits intomainfrom
test/push-workflow-dry-run

Conversation

@anth-volk
Copy link
Copy Markdown
Collaborator

DO NOT MERGE

This PR exists solely to test the push.yaml workflow from PR #667.

The push workflow has been modified to trigger on pull_request instead of push so it runs on this PR. It includes the per-dataset Modal build steps, integration tests after each stage, approval gate, and pipeline dispatch.

Contains a deliberate lint failure (DO_NOT_MERGE_THIS_PR in __init__.py) to prevent accidental merge.

What to look for

  • The [TEST] Push workflow dry run workflow appears in the Actions tab
  • The lint step, build steps, and test steps are each visible as separate sections
  • The $GITHUB_STEP_SUMMARY timing table populates after each build step
  • The existing PR checks (lint) fail due to the deliberate formatting violation

Close this PR after testing is complete.

🤖 Generated with Claude Code

@anth-volk anth-volk force-pushed the test/push-workflow-dry-run branch 2 times, most recently from 8716cb0 to e22ea43 Compare March 31, 2026 17:25
anth-volk and others added 11 commits March 31, 2026 19:44
Fixes #638. The versioning workflow used a PAT (POLICYENGINE_GITHUB)
to push the "Update package version" commit, which broke when the
token expired. Switch to a GitHub App token via
actions/create-github-app-token@v1, matching the pattern used in
policyengine-api-v2-alpha.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Split tests into unit/ (self-contained, synthetic data, mocks) and
integration/ (requires built H5 datasets). Unit sub-folders use no
test_ prefix (datasets/, calibration/) to avoid confusion with
integration tests.

- Move 30+ unit test files to tests/unit/ with calibration/ and
  datasets/ sub-directories
- Move 11 integration test files to tests/integration/
- Merge test_dataset_sanity.py into per-dataset integration files
  (test_cps.py, test_enhanced_cps.py, test_sparse_enhanced_cps.py)
- Rename calibration integration tests to match dataset names
  (test_source_imputed_cps_masking.py, _consistency.py)
- Move top-level tests/ files into appropriate unit/ or integration/
- Add integration conftest.py with skip logic and shared fixtures
- Update pyproject.toml testpaths and add pytest-cov dependency
- Update modal_app/data_build.py TEST_MODULES
- Add make test-unit and make test-integration targets
- Fix Path(__file__) references in moved test files

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New queue-based H5 build system that replaces the partition-based
N-worker model with one-container-per-item processing.

- generate_work_items(scope, db_path): auto-generates work item lists
  filtered by scope (all/national/state/congressional/local/test).
  Test scope builds national + NY + NV-01 only.
- build_single_area(): Modal function (1 CPU, 16GB) that processes
  exactly one work item per container via worker_script.py
- queue_coordinator(): spawns up to 50 single-item workers, collects
  results. No multi-threading, no chunking.
- main_queue entrypoint for CLI access
- Wire scope parameter from pipeline.yaml through run_pipeline()
- Fall back to legacy coordinate_publish for scope=all

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add testing standards: unit vs integration placement rules, per-dataset
naming convention, make test-unit/test-integration commands. Add CI/CD
overview documenting the four workflow files and their triggers. Update
Python version from 3.11 to 3.12-3.13.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Delete 7 deprecated/redundant workflow files and replace with 3
consolidated workflows matching the policyengine-api-v2-alpha pattern.

pr.yaml: fork check, lint, uv.lock, changelog, unit tests with
Codecov (informational), smoke test. Runs in ~2-3 minutes.

push.yaml: two paths — version bump commits publish to PyPI; all
other commits run per-dataset Modal builds with integration tests
after each stage, then manual approval gate, then pipeline dispatch.

Also adds:
- .codecov.yml with informational-only coverage reporting
- --script mode to data_build.py for per-dataset Modal execution
- SCRIPT_SHORT_NAMES mapping for human-friendly script names
- run_single_script() Modal function for single-dataset builds

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Pass HUGGING_FACE_TOKEN to unit test step in pr.yaml so tests that
  transitively import huggingface.py can collect without crashing
- Fix test_etl_national_targets.py: remove nonexistent
  TAX_EXPENDITURE_REFORM_ID import, use reform_id > 0 filter instead
  (mirrors fix from unmerged PR #664)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Remove test_reproducibility.py: imports modules that never existed
  (enhanced_cps.imputation, enhanced_cps.reweight, scripts). Broken
  since PR #117 (July 2025), never caught because old testpaths
  didn't include top-level tests/.
- Fix test_legacy_target_overview_without_reform_id: create builder
  before installing legacy view so __init__'s create_or_replace_views
  doesn't overwrite it. Clear column cache so builder re-detects
  missing reform_id. Mirrors fix from unmerged PR #665.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
download_private_prerequisites.py downloads files (puf_2015.csv,
demographics_2015.csv, soi.csv, np2023_d5_mid.csv, policy_data.db)
to the local filesystem, which vanishes when the container exits.
In --script mode, each script runs in a separate container, so
subsequent scripts couldn't find the prerequisites.

Fix: save prerequisite files to the checkpoint volume after download,
and restore them before running any other script.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Tests must run where the data lives. The previous push.yaml ran
pytest on the GH runner where H5 files don't exist, causing tests
to silently skip via conftest logic.

Fix: run tests inside the same Modal container that built the data.

data_build.py changes:
- SCRIPT_TESTS mapping: which integration tests go with which build
- --run-tests flag: runs pytest after build in the same container
- --test flag: runs standalone integration tests on Modal with all
  checkpointed data restored
- run_integration_test() function for tests not tied to a build step

push.yaml changes:
- Phase-based matrix jobs instead of sequential steps
- Phase 1 (uprating, acs, irs_puf): parallel, independent
- Phase 2 (cps, puf): parallel, needs phase1
- Phase 3 (extended_cps): needs phase2
- Phase 4 (enhanced_cps, stratified_cps): parallel, needs phase3
- Phase 5 (source_imputed, small_enhanced): parallel, needs phase4
- Each matrix entry: one modal run --script X --run-tests call
- Remaining tests (census_cps, database_build): parallel after phase4
- Image cache no longer busted between steps (clean runner per job)
- Exit codes propagate from Modal to GH Actions naturally

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@anth-volk anth-volk force-pushed the test/push-workflow-dry-run branch from e22ea43 to cfc5eee Compare March 31, 2026 17:54
Three bugs fixed:

1. Restore to STORAGE_FOLDER, not source tree: Dataset classes
   resolve file_path via the installed package's STORAGE_FOLDER
   (in .venv/site-packages/), but restore_from_checkpoint wrote
   to the source-tree relative path. Now restores to both locations
   so both subprocess scripts and Dataset class lookups find files.

2. Add volume.reload() in run_single_script and run_integration_test:
   Without reload, containers see stale volume state and miss files
   written by prior --script calls.

3. Preserve full path in checkpoint keys: get_checkpoint_path now
   uses the full relative path (e.g., calibration/policy_data.db)
   instead of just the filename, preventing potential collisions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@anth-volk anth-volk force-pushed the test/push-workflow-dry-run branch from cfc5eee to fda9ff0 Compare March 31, 2026 19:27
Each matrix job was rebuilding the Modal image (~2 min each) because
modal run uploads and hashes local files per invocation. Fix: deploy
once at the start with a unique app name (policyengine-us-data-ci-{run_id}),
then all matrix jobs call the deployed functions via Function.from_name().

- App name is now configurable via MODAL_APP_NAME env var
- deploy-modal job: deploys once, outputs app name
- All phase/test jobs: use Function.from_name() instead of modal run
- cleanup-modal job: stops the deployed app after completion
- Matrix jobs no longer need repo checkout (just pip install modal)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@anth-volk anth-volk force-pushed the test/push-workflow-dry-run branch from fda9ff0 to 20758cb Compare March 31, 2026 19:57
Function.from_name().remote() doesn't stream container logs by
default. Wrap all calls in modal.enable_output() context manager
to stream stdout/stderr from Modal containers to GH Actions logs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@anth-volk anth-volk force-pushed the test/push-workflow-dry-run branch 3 times, most recently from d9c5698 to e50f20e Compare March 31, 2026 21:46
Revert the modal deploy + Function.from_name() approach — it didn't
stream container logs. Go back to modal run per matrix job, which
streams logs natively. Accept the image rebuild overhead for now
while we investigate the cache miss root cause.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@anth-volk anth-volk force-pushed the test/push-workflow-dry-run branch from e50f20e to cce4ea7 Compare March 31, 2026 22:35
anth-volk and others added 2 commits April 1, 2026 16:30
Following the policyengine-api-v2 simulation-api pattern:
- Replace uv sync (creates .venv) with uv pip install --system
  (installs to system Python directly)
- Replace all "uv run python" subprocess calls with plain "python"
  across data_build.py, local_area.py, pipeline.py, and
  remote_calibration_runner.py

This eliminates the venv/PATH mismatch that caused
ModuleNotFoundError for policyengine_core, and prevents
workers from reinstalling 209 packages on every container start.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@anth-volk anth-volk force-pushed the test/push-workflow-dry-run branch from cce4ea7 to 24f0374 Compare April 1, 2026 14:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant