Skip to content

Latest commit

 

History

History
50 lines (37 loc) · 1.77 KB

File metadata and controls

50 lines (37 loc) · 1.77 KB

Welcome to Session 7

Learning goals

By the end of Session 7, you should be able to:

  • explain what pandas is useful for in data analytics work
  • create and inspect Series and DataFrame objects
  • load tabular data and select rows and columns
  • identify common data quality problems
  • clean missing values, duplicates, inconsistent text, and incorrect data types
  • fill missing numeric values using interpolation
  • use NumPy arrays for simple numeric work

Recommended order

  1. Part 1: Introduction to pandas
  2. Part 2: Introduction to data cleaning
  3. Part 3: Introduction to NumPy
  4. Homework
  5. Practice with quizzes when ready.
  6. Write your own work in solutions.
  7. Review reference solutions only after attempting tasks yourself.

Datasets

The session datasets are in datasets:

  • Movies.json
  • Pokemon.csv

Load them directly from your Python files with:

movies = pd.read_json("datasets/Movies.json")
pokemon = pd.read_csv("datasets/Pokemon.csv", encoding="cp1252")

Run your Python files from the session7 folder so these paths work as written.

Notes

  • Tutorial and warm-up material is included directly inside each part markdown file.
  • Keep your own solutions in separate files inside solutions/.
  • Use exercise-style names in solutions/ (for example exercise-07-01.py, exercise-07-02.py, exercise-07-03.py, exercise-07-homework.py).
  • Reference answers are in session_solutions/.
  • Class datasets for this session are inside datasets.
  • Do not commit .venv/ to GitHub.
  • Install the requirements with:
    • pip install -r requirements.txt