-
Notifications
You must be signed in to change notification settings - Fork 0
feat: scaffold Week 3 assignment — validated ingestion pipeline #2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
3 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
File renamed without changes.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
File renamed without changes.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,80 +1,104 @@ | ||
| # Data Track — Week 3 Assignment (Template) | ||
| # Data Track — Week 3 Assignment | ||
|
|
||
| The HackYourFuture Data Track Week 3 assignment: **Build a Validated Ingestion Pipeline**. | ||
| **Build a Validated Ingestion Pipeline** · Total: 100 points · Passing: 60 | ||
|
|
||
| > 👩🎓 **Students:** you are in the wrong place. Do **not** fork or use this template. | ||
| > Go to your cohort's assignment repo under | ||
| > [`HackYourAssignment`](https://github.com/HackYourAssignment) (e.g. `c55-data-week3`, | ||
| > `c56-data-week3`, …). Your teacher posts the exact link in your cohort channel. | ||
| > Fork the cohort repo, branch, and open a PR back to it. Full instructions live in the | ||
| > Week 3 Assignment chapter in the learning platform. | ||
| --- | ||
|
|
||
| ## For instructors / track maintainers | ||
| ## Why no task folders? | ||
|
|
||
| Previous assignments split work across `task-1/`, `task-2/`, etc. This assignment drops that structure intentionally. Real Python projects keep all related modules at the root — you navigate by reading the code, not by opening numbered folders. | ||
|
|
||
| Every file you need to touch is listed below, in the order you should work through them. | ||
|
|
||
| This repo is the **upstream template** for the Week 3 assignment. At the start of each | ||
| cohort, generate a cohort-specific repo under the `HackYourAssignment` org from this | ||
| template (GitHub: **Use this template → Create a new repository**, owner = | ||
| `HackYourAssignment`, name = `c<NN>-data-week3`). Students then fork *that* cohort repo | ||
| and open PRs back to it; the auto-grader runs on every push. | ||
| --- | ||
|
|
||
| Edits to the assignment, dataset, or grader belong here on the template, not on the | ||
| cohort copies. | ||
| ## Where to start | ||
|
|
||
| ## Tasks at a glance | ||
| Work through the files in this order. Each one maps to a task in the assignment chapter. | ||
|
|
||
| | Task | Folder | Points | What you build | | ||
| | Step | File | Task in the chapter | Points | | ||
| |---|---|---|---| | ||
| | **Tasks 1-6** — Ingestion Pipeline | `task-1/` | 70 | A modular pipeline: `fetch_with_retry` with exponential backoff, Open-Meteo API ingestion, CSV file ingestion, Pydantic validation with `@field_validator`, SQLite upsert storage, and a `pipeline.py` orchestrator that produces an error report and pipeline summary. | | ||
| | **Task 7** — Azure CLI + Portal | `task-1/output/` | 15 | Run three `az` CLI commands, call the ARM API with a Bearer token, save `azure_resource_groups.json`, and fill in `azure_compare.md` with three comparison points. | | ||
| | **Task 8** — AI Debug Report | `task-2/` | 15 | Document one LLM-assisted debugging session. Fill in the four sections of `AI_DEBUG.md`. | | ||
| | 1 | `models.py` | Task 4: Pydantic Validation | — | | ||
| | 2 | `ingest_api.py` | Task 1: Error Handling + Task 2: API Ingestion | — | | ||
| | 3 | `ingest_files.py` | Task 3: File Reading | — | | ||
| | 4 | `validate.py` | Task 4: Pydantic Validation | — | | ||
| | 5 | `database.py` | Task 5: Database Storage | — | | ||
| | 6 | `pipeline.py` | Task 6: Pipeline Orchestration | 70 total | | ||
| | 7 | `output/azure_compare.md` | Task 7: Azure CLI + Portal | 15 | | ||
| | 8 | `AI_DEBUG.md` | Task 8: AI Debug Report | 15 | | ||
|
|
||
| Total: 100 · Passing: 60. | ||
| Open each file and read the docstrings and TODO comments — they explain exactly what to implement. Start with `models.py` and `ingest_api.py`; `pipeline.py` is the last thing you wire together. | ||
|
|
||
| --- | ||
|
|
||
| ## Repository layout | ||
|
|
||
| ```text | ||
| . | ||
| ├── task-1/ | ||
| │ ├── data/ | ||
| │ │ └── weather_stations.csv # messy input dataset (committed; do not edit) | ||
| │ ├── output/ # pipeline writes here (gitignored except templates) | ||
| │ │ ├── error_report.json # generated by pipeline.py | ||
| │ │ ├── azure_resource_groups.json # Task 7: save ARM API response here | ||
| │ │ └── azure_compare.md # Task 7: fill in 3 comparison points | ||
| │ ├── models.py # Pydantic WeatherReading model — fill in TODOs | ||
| │ ├── ingest_api.py # fetch_with_retry + API ingestion — fill in TODOs | ||
| │ ├── ingest_files.py # CSV reader — fill in TODOs | ||
| │ ├── validate.py # batch validation — fill in TODOs | ||
| │ ├── database.py # SQLite create, upsert, query — fill in TODOs | ||
| │ ├── pipeline.py # orchestrator — fill in TODOs | ||
| │ ├── .env.example # no secrets needed; copy to .env if you extend it | ||
| │ └── requirements.txt | ||
| ├── task-2/ | ||
| │ └── AI_DEBUG.md # Task 8: fill in the four sections | ||
| ├── data/ | ||
| │ └── weather_stations.csv # input dataset — do not edit | ||
| ├── output/ | ||
| │ ├── azure_compare.md # Task 7: fill in your 3 comparison sentences | ||
| │ └── azure_resource_groups.json # Task 7: generated by your Python script | ||
| ├── models.py # Step 1 — Pydantic model (Task 4) | ||
| ├── ingest_api.py # Step 2 — fetch_with_retry + API call (Tasks 1–2) | ||
| ├── ingest_files.py # Step 3 — CSV reader (Task 3) | ||
| ├── validate.py # Step 4 — batch validation (Task 4) | ||
| ├── database.py # Step 5 — SQLite tables + upsert (Task 5) | ||
| ├── pipeline.py # Step 6 — orchestrator that calls everything (Task 6) | ||
| ├── AI_DEBUG.md # Step 8 — your debugging log (Task 8) | ||
| ├── requirements.txt | ||
| ├── .env.example | ||
| ├── .hyf/ | ||
| │ └── test.sh # auto-grader (read it to see exactly what it checks) | ||
| │ └── test.sh # auto-grader — read this to see exactly what is checked | ||
| └── .github/workflows/ | ||
| └── grade-assignment.yml # runs .hyf/test.sh on every PR | ||
| └── grade-assignment.yml | ||
| ``` | ||
|
|
||
| ## Run the grader locally | ||
| Files the pipeline generates at runtime (gitignored): | ||
| - `weather.db` — SQLite database | ||
| - `output/error_report.json` — invalid records from validation | ||
|
|
||
| --- | ||
|
|
||
| Before opening a PR, run the same checks the auto-grader runs: | ||
| ## Run the pipeline | ||
|
|
||
|
lassebenni marked this conversation as resolved.
|
||
| ```bash | ||
| cd task-1 | ||
| python3 -m pip install -r requirements.txt | ||
| cd .. | ||
| python3 -m pipeline | ||
| ``` | ||
|
|
||
| --- | ||
|
|
||
| ## Check your score locally | ||
|
|
||
| Run the same grader the auto-grader runs on every PR push: | ||
|
|
||
| ```bash | ||
| bash .hyf/test.sh | ||
| cat .hyf/score.json | ||
| ``` | ||
|
|
||
| ## Scoring ladder (Tasks 1-6) | ||
| --- | ||
|
|
||
| ## Scoring ladder (Tasks 1–6) | ||
|
|
||
| Points are awarded incrementally so partial work earns partial credit: | ||
|
|
||
| | Score | What the grader checks | | ||
| |---|---| | ||
| | 10/70 | All required files exist | | ||
| | 20/70 | `python3 -m pipeline` runs without crashing | | ||
| | 40/70 | `output/error_report.json` is a valid list with the right fields; `weather.db` has rows | | ||
| | 50/70 | Pipeline is idempotent: a second run leaves the same row count (upsert working) | | ||
| | 70/70 | Code uses: `@field_validator` + `@classmethod` in `models.py`, `?` placeholders in `database.py`, `ON CONFLICT` upsert in `database.py`, `time.sleep` backoff in `ingest_api.py` | | ||
|
|
||
| --- | ||
|
|
||
| ## For instructors / track maintainers | ||
|
|
||
| This repo is the upstream template. At the start of each cohort, generate a cohort repo under `HackYourAssignment` (**Use this template → Create a new repository**, owner = `HackYourAssignment`, name = `c<NN>-data-week3`). Students fork that cohort repo and open PRs back to it; the auto-grader runs on every push. | ||
|
|
||
| The grader awards points incrementally so partial credit is meaningful: | ||
| Edits to the assignment, dataset, or grader belong here on the template — not on cohort copies. | ||
|
|
||
| - **10/70** — required files all exist (`models.py`, `ingest_api.py`, `ingest_files.py`, `validate.py`, `database.py`, `pipeline.py`, `.env.example`). | ||
| - **20/70** — `python3 -m pipeline` runs from `task-1/` without crashing. | ||
| - **40/70** — `output/error_report.json` exists, is a valid JSON list, and contains objects with `index`, `source`, `raw_record`, and `error_details` fields; `weather.db` has rows in `weather_readings`. | ||
| - **50/70** — pipeline is idempotent: running it twice leaves the same row count in `weather_readings` (upsert working correctly). | ||
| - **70/70** — code uses the required patterns: `@field_validator` + `@classmethod` in `models.py`, parameterized queries (`?` placeholders) in `database.py`, `ON CONFLICT ... DO UPDATE SET` in `database.py`, retry/backoff logic in `ingest_api.py`. | ||
| > 👩🎓 **Students:** if you landed here, you are in the wrong place. Go to your cohort repo under [`HackYourAssignment`](https://github.com/HackYourAssignment). Your teacher posts the exact link in your cohort channel. | ||
File renamed without changes.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.