Skip to content

feat: scaffold Week 3 assignment — validated ingestion pipeline#2

Merged
lassebenni merged 3 commits into
mainfrom
feat/scaffold-week3-assignment
May 18, 2026
Merged

feat: scaffold Week 3 assignment — validated ingestion pipeline#2
lassebenni merged 3 commits into
mainfrom
feat/scaffold-week3-assignment

Conversation

@lassebenni
Copy link
Copy Markdown
Collaborator

Summary

  • Scaffolds the Week 3 assignment repo with a flat project structure that matches the assignment chapter's task numbering exactly (no task-1/task-2/ confusion)
  • Auto-grader (test.sh) covers the full scoring ladder: 10 → 20 → 40 → 50 → 70 pts for the pipeline + 15 pts Azure + 15 pts AI debug
  • Verified end-to-end: working solution scores 80/100, bare scaffold scores 20/100

What's included

  • models.py, ingest_api.py, ingest_files.py, validate.py, database.py, pipeline.py — student starters with raise NotImplementedError TODOs
  • data/weather_stations.csv — messy dataset with 6 intentional validation failures and 4 valid rows (including a duplicate that exercises the upsert path)
  • output/azure_compare.md — blank template (students must write >1200 chars to earn 15 pts)
  • AI_DEBUG.md — four-section template at the root (students must write >1800 chars for full marks)
  • .hyf/test.sh — autograder with incremental scoring, idempotency gate, and code-pattern introspection
  • .devcontainer/devcontainer.json — Python 3.11 + Azure CLI pre-installed
  • AZURE_LOGIN.md — Codespaces login guide for Task 7
  • .github/workflows/grade-assignment.yml — CI that runs the grader on every PR

Test plan

  • Bare scaffold: 20/100 (files exist + AI_DEBUG sections), pass=false
  • Working solution: 80/100 (70/70 pipeline + 10/15 AI_DEBUG), pass=true
  • Idempotency: second pipeline run leaves the same row count
  • Code patterns: @field_validator, @classmethod, parameterized ?, ON CONFLICT, time.sleep all detected
  • Azure task: requires az login, correctly scores 0 without credentials

🤖 Generated with Claude Code

Remove the task-1/ and task-2/ folder split. All pipeline files now live
at the repo root, matching the Deliverables layout in the assignment chapter
exactly. Students no longer see "task-1/" and wonder if that maps to
"Task 1" in the assignment instructions.

- Moved task-1/{models,ingest_api,ingest_files,validate,database,pipeline}.py → root
- Moved task-1/data/ → data/
- Moved task-1/output/ → output/
- Moved task-1/.env.example and requirements.txt → root
- Moved task-2/AI_DEBUG.md → root
- Updated .gitignore, devcontainer.json, .hyf/test.sh, and README accordingly
…o all Python files

- README opens with a 'Why no task folders?' explanation and a step-by-step
  table (Step 1 = models.py through Step 6 = pipeline.py) so students know
  where to start without numbered folders to lean on
- Every Python file now has a 2-3 line header comment naming the step, the
  task, and the role that file plays in the pipeline
- Scoring ladder rewritten as a table for scannability
- Student-redirect callout moved to the bottom (instructors read the top)
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR scaffolds the Week 3 validated ingestion pipeline assignment as a flat root-level project and updates the grader/docs to match that structure.

Changes:

  • Adds starter Python modules for API ingestion, CSV ingestion, validation, SQLite storage, and orchestration.
  • Adds assignment templates/data for Azure comparison and AI debugging.
  • Updates the autograder, README paths, devcontainer install path, and ignore rules for the root-level layout.

Reviewed changes

Copilot reviewed 9 out of 15 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
models.py Adds the starter WeatherReading Pydantic model.
ingest_api.py Adds API fetch/retry starter functions.
ingest_files.py Adds CSV ingestion starter function.
validate.py Adds batch validation starter function.
database.py Adds SQLite helper starter functions.
pipeline.py Adds the orchestration scaffold and expected pipeline steps.
data/weather_stations.csv Adds the messy input dataset.
output/azure_compare.md Adds the Azure comparison response template.
AI_DEBUG.md Adds the AI debugging report template.
.hyf/test.sh Updates grading paths and scoring checks for the flat structure.
.gitignore Updates generated artifact ignore paths for the flat structure.
README.md Updates assignment layout and local grader instructions.
requirements.txt Adds Python dependencies.
.env.example Adds environment variable guidance.
.devcontainer/devcontainer.json Updates devcontainer dependency installation path.
AZURE_LOGIN.md Provides Azure login guidance.
.github/workflows/grade-assignment.yml Provides the grading workflow entry point.
Comments suppressed due to low confidence (2)

models.py:13

  • With Pydantic v2, a @field_validator without mode="before" runs after the min_length=1 constraint. If students implement the TODO as “strip and title-case,” a whitespace-only station like " " can pass the length check first and then be stored as an empty string after stripping.
    pipeline.py:24
  • This step tells students to validate “all records” together, but validate_records accepts a single source value for every error it returns. If API and CSV rows are combined before validation, the error report cannot accurately identify whether each failed row came from api or csv; the instructions should make it explicit to validate each source separately or attach source per record.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread README.md
Comment thread .hyf/test.sh Outdated
Comment thread .hyf/test.sh
Comment thread .hyf/test.sh
Comment thread .hyf/test.sh
Comment thread .hyf/test.sh
Comment thread .hyf/test.sh
…ach run

- Parameterized query check now passes for both inline (execute('...?...')) and
  multi-line/variable-assignment SQL forms: checks for '?' anywhere in database.py
  AND an .execute call, rather than requiring both on the same physical line
- Remove weather.db and output/error_report.json at grader start so local reruns
  cannot inflate the score with stale artifacts from a prior successful run
@lassebenni lassebenni merged commit bbb8c33 into main May 18, 2026
@lassebenni lassebenni deleted the feat/scaffold-week3-assignment branch May 18, 2026 15:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants