By the end of Session 7, you should be able to:
- explain what pandas is useful for in data analytics work
- create and inspect
SeriesandDataFrameobjects - load tabular data and select rows and columns
- identify common data quality problems
- clean missing values, duplicates, inconsistent text, and incorrect data types
- fill missing numeric values using interpolation
- use NumPy arrays for simple numeric work
- Part 1: Introduction to pandas
- Part 2: Introduction to data cleaning
- Part 3: Introduction to NumPy
- Homework
- Practice with quizzes when ready.
- Write your own work in solutions.
- Review reference solutions only after attempting tasks yourself.
The session datasets are in datasets:
Movies.jsonPokemon.csv
Load them directly from your Python files with:
movies = pd.read_json("datasets/Movies.json")
pokemon = pd.read_csv("datasets/Pokemon.csv", encoding="cp1252")Run your Python files from the session7 folder so these paths work as written.
- Tutorial and warm-up material is included directly inside each part markdown file.
- Keep your own solutions in separate files inside
solutions/. - Use exercise-style names in
solutions/(for exampleexercise-07-01.py,exercise-07-02.py,exercise-07-03.py,exercise-07-homework.py). - Reference answers are in
session_solutions/. - Class datasets for this session are inside datasets.
- Do not commit
.venv/to GitHub. - Install the requirements with:
pip install -r requirements.txt