From 016052295fbc07566b2c9897fe7808d163cc06a2 Mon Sep 17 00:00:00 2001 From: estefaniabarrosa Date: Tue, 19 May 2026 11:36:42 +0100 Subject: [PATCH 1/2] solved lab --- ...lab-sql-python-connection-checkpoint.ipynb | 616 ++++++++++++++++++ lab-sql-python-connection.ipynb | 616 ++++++++++++++++++ 2 files changed, 1232 insertions(+) create mode 100644 .ipynb_checkpoints/lab-sql-python-connection-checkpoint.ipynb create mode 100644 lab-sql-python-connection.ipynb diff --git a/.ipynb_checkpoints/lab-sql-python-connection-checkpoint.ipynb b/.ipynb_checkpoints/lab-sql-python-connection-checkpoint.ipynb new file mode 100644 index 0000000..35bdb67 --- /dev/null +++ b/.ipynb_checkpoints/lab-sql-python-connection-checkpoint.ipynb @@ -0,0 +1,616 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "eccf68ee", + "metadata": {}, + "source": [ + "# lab-sql-python-connection" + ] + }, + { + "cell_type": "markdown", + "id": "59e00018", + "metadata": {}, + "source": [ + "## 1. Import Libraries\n", + "\n", + "We will use:\n", + "\n", + "- `pandas` to work with DataFrames.\n", + "- `sqlalchemy` to create the connection engine.\n", + "- `text` from SQLAlchemy to safely write SQL queries with parameters." + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "5ad93640", + "metadata": {}, + "outputs": [], + "source": [ + "import pandas as pd\n", + "from sqlalchemy import create_engine, text" + ] + }, + { + "cell_type": "markdown", + "id": "4c0b813b", + "metadata": {}, + "source": [ + "## 2. Create the Database Connection\n", + "\n", + "Here we create the connection between Python and the Sakila database.\n", + "\n", + "> Replace `your_password` with your own MySQL password." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "96da5463", + "metadata": {}, + "outputs": [], + "source": [ + "# Database connection settings\n", + "password = \"your_password\" # Replace with your MySQL password\n", + "db = \"sakila\"\n", + "\n", + "# Create the connection string\n", + "connection_string = f\"mysql+pymysql://root:{password}@localhost/{db}\"\n", + "\n", + "# Create the engine\n", + "engine = create_engine(connection_string)" + ] + }, + { + "cell_type": "markdown", + "id": "81d390f6", + "metadata": {}, + "source": [ + "### Test the connection\n", + "\n", + "Before moving forward, it is useful to test if the connection works correctly." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "5ab16ebf", + "metadata": {}, + "outputs": [], + "source": [ + "# Test connection\n", + "query = text(\"SELECT * FROM rental LIMIT 5;\")\n", + "\n", + "sample_rentals = pd.read_sql(query, engine)\n", + "sample_rentals" + ] + }, + { + "cell_type": "markdown", + "id": "b02b4c0a", + "metadata": {}, + "source": [ + "## 3. Function 1: Retrieve Rentals by Month\n", + "\n", + "The function `rentals_month()` retrieves all rental records for a specific month and year.\n", + "\n", + "It receives three parameters:\n", + "\n", + "- `engine`: the database connection engine.\n", + "- `month`: the month we want to analyze.\n", + "- `year`: the year we want to analyze.\n", + "\n", + "The function returns a Pandas DataFrame." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "4232b247", + "metadata": {}, + "outputs": [], + "source": [ + "def rentals_month(engine, month, year):\n", + " \"\"\"\n", + " Retrieves rental data for a specific month and year from the Sakila database.\n", + " \n", + " Parameters:\n", + " engine: SQLAlchemy engine used to connect to the database.\n", + " month: Integer representing the month.\n", + " year: Integer representing the year.\n", + " \n", + " Returns:\n", + " A pandas DataFrame with rental data for the selected month and year.\n", + " \"\"\"\n", + " \n", + " query = text(\"\"\"\n", + " SELECT *\n", + " FROM rental\n", + " WHERE MONTH(rental_date) = :month\n", + " AND YEAR(rental_date) = :year;\n", + " \"\"\")\n", + " \n", + " df = pd.read_sql(query, engine, params={\"month\": month, \"year\": year})\n", + " \n", + " return df" + ] + }, + { + "cell_type": "markdown", + "id": "30bc8a27", + "metadata": {}, + "source": [ + "## 4. Retrieve May and June 2005 Rentals\n", + "\n", + "According to the challenge, we need to analyze customers who were active in both May and June." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "6328647f", + "metadata": {}, + "outputs": [], + "source": [ + "# Retrieve rental data for May and June 2005\n", + "rentals_may = rentals_month(engine, 5, 2005)\n", + "rentals_june = rentals_month(engine, 6, 2005)\n", + "\n", + "print(\"May rentals shape:\", rentals_may.shape)\n", + "print(\"June rentals shape:\", rentals_june.shape)" + ] + }, + { + "cell_type": "markdown", + "id": "31b3194e", + "metadata": {}, + "source": [ + "### Preview May rentals" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "2e46dba7", + "metadata": {}, + "outputs": [], + "source": [ + "rentals_may.head()" + ] + }, + { + "cell_type": "markdown", + "id": "98a67688", + "metadata": {}, + "source": [ + "### Preview June rentals" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c3f09fd2", + "metadata": {}, + "outputs": [], + "source": [ + "rentals_june.head()" + ] + }, + { + "cell_type": "markdown", + "id": "f210e167", + "metadata": {}, + "source": [ + "## 5. Basic Exploration\n", + "\n", + "Before comparing customers, we can quickly check how many rental transactions happened in each month and how many unique customers were active." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "301d370d", + "metadata": {}, + "outputs": [], + "source": [ + "may_total_rentals = rentals_may.shape[0]\n", + "june_total_rentals = rentals_june.shape[0]\n", + "\n", + "may_unique_customers = rentals_may[\"customer_id\"].nunique()\n", + "june_unique_customers = rentals_june[\"customer_id\"].nunique()\n", + "\n", + "summary = pd.DataFrame({\n", + " \"month\": [\"May 2005\", \"June 2005\"],\n", + " \"total_rentals\": [may_total_rentals, june_total_rentals],\n", + " \"unique_customers\": [may_unique_customers, june_unique_customers]\n", + "})\n", + "\n", + "summary" + ] + }, + { + "cell_type": "markdown", + "id": "a3327298", + "metadata": {}, + "source": [ + "### Initial Insight\n", + "\n", + "This summary helps us understand the overall activity in each month before going into the customer-level comparison.\n", + "\n", + "If June has more rentals or more active customers than May, it may suggest an increase in customer activity." + ] + }, + { + "cell_type": "markdown", + "id": "3cc96811", + "metadata": {}, + "source": [ + "## 6. Function 2: Count Rentals by Customer and Month\n", + "\n", + "The function `rental_count_month()` receives the rental DataFrame for one month and counts how many rentals each customer made.\n", + "\n", + "The new column name is created dynamically using the month and year.\n", + "\n", + "For example:\n", + "\n", + "- May 2005 → `rentals_05_2005`\n", + "- June 2005 → `rentals_06_2005`" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "7aaa8e7c", + "metadata": {}, + "outputs": [], + "source": [ + "def rental_count_month(df, month, year):\n", + " \"\"\"\n", + " Counts the number of rentals made by each customer during a selected month and year.\n", + " \n", + " Parameters:\n", + " df: DataFrame returned by rentals_month().\n", + " month: Integer representing the month.\n", + " year: Integer representing the year.\n", + " \n", + " Returns:\n", + " A DataFrame with customer_id and the number of rentals for that month.\n", + " \"\"\"\n", + " \n", + " column_name = f\"rentals_{month:02d}_{year}\"\n", + " \n", + " rental_count = (\n", + " df.groupby(\"customer_id\")\n", + " .size()\n", + " .reset_index(name=column_name)\n", + " )\n", + " \n", + " return rental_count" + ] + }, + { + "cell_type": "markdown", + "id": "b266b031", + "metadata": {}, + "source": [ + "## 7. Count Rentals for May and June" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ee18607e", + "metadata": {}, + "outputs": [], + "source": [ + "rentals_may_count = rental_count_month(rentals_may, 5, 2005)\n", + "rentals_june_count = rental_count_month(rentals_june, 6, 2005)\n", + "\n", + "rentals_may_count.head()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "0b1baf9c", + "metadata": {}, + "outputs": [], + "source": [ + "rentals_june_count.head()" + ] + }, + { + "cell_type": "markdown", + "id": "e46ea476", + "metadata": {}, + "source": [ + "## 8. Function 3: Compare Rentals Between Two Months\n", + "\n", + "The function `compare_rentals()` combines both monthly rental count DataFrames.\n", + "\n", + "We use an `inner` merge because the challenge asks for customers who were active in both months.\n", + "\n", + "Then we calculate:\n", + "\n", + "```text\n", + "difference = rentals_06_2005 - rentals_05_2005\n", + "```\n", + "\n", + "This means:\n", + "\n", + "- Positive difference → the customer rented more in June.\n", + "- Negative difference → the customer rented more in May.\n", + "- Zero → the customer had the same number of rentals in both months." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c14d1813", + "metadata": {}, + "outputs": [], + "source": [ + "def compare_rentals(df1, df2):\n", + " \"\"\"\n", + " Combines two monthly rental count DataFrames and calculates the difference\n", + " between the number of rentals in the second month and the first month.\n", + " \n", + " Parameters:\n", + " df1: Rental count DataFrame for the first month.\n", + " df2: Rental count DataFrame for the second month.\n", + " \n", + " Returns:\n", + " A combined DataFrame with a difference column.\n", + " \"\"\"\n", + " \n", + " comparison = pd.merge(df1, df2, on=\"customer_id\", how=\"inner\")\n", + " \n", + " first_month_col = comparison.columns[1]\n", + " second_month_col = comparison.columns[2]\n", + " \n", + " comparison[\"difference\"] = comparison[second_month_col] - comparison[first_month_col]\n", + " \n", + " return comparison" + ] + }, + { + "cell_type": "markdown", + "id": "744192fb", + "metadata": {}, + "source": [ + "## 9. Compare May vs June Activity" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "6b46bde7", + "metadata": {}, + "outputs": [], + "source": [ + "comparison_df = compare_rentals(rentals_may_count, rentals_june_count)\n", + "\n", + "comparison_df.head()" + ] + }, + { + "cell_type": "markdown", + "id": "5b796c63", + "metadata": {}, + "source": [ + "## 10. Analyze the Results\n", + "\n", + "Now that we have the comparison DataFrame, we can extract useful insights." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "8c118436", + "metadata": {}, + "outputs": [], + "source": [ + "# Number of customers active in both months\n", + "active_both_months = comparison_df[\"customer_id\"].nunique()\n", + "\n", + "# Customers who increased, decreased, or maintained activity\n", + "increased_activity = comparison_df[comparison_df[\"difference\"] > 0].shape[0]\n", + "decreased_activity = comparison_df[comparison_df[\"difference\"] < 0].shape[0]\n", + "same_activity = comparison_df[comparison_df[\"difference\"] == 0].shape[0]\n", + "\n", + "activity_summary = pd.DataFrame({\n", + " \"activity_change\": [\"Increased in June\", \"Decreased in June\", \"Same activity\"],\n", + " \"number_of_customers\": [increased_activity, decreased_activity, same_activity]\n", + "})\n", + "\n", + "activity_summary" + ] + }, + { + "cell_type": "markdown", + "id": "36a1523b", + "metadata": {}, + "source": [ + "### Insight\n", + "\n", + "This table shows how customer behavior changed between May and June.\n", + "\n", + "It helps us identify whether customer engagement increased, decreased, or stayed stable among customers who were active in both months." + ] + }, + { + "cell_type": "markdown", + "id": "3113f318", + "metadata": {}, + "source": [ + "## 11. Top Customers with the Biggest Increase\n", + "\n", + "These are the customers whose rental activity increased the most from May to June." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "11179e6e", + "metadata": {}, + "outputs": [], + "source": [ + "top_increase = comparison_df.sort_values(by=\"difference\", ascending=False).head(10)\n", + "top_increase" + ] + }, + { + "cell_type": "markdown", + "id": "6ef1bede", + "metadata": {}, + "source": [ + "## 12. Customers with the Biggest Decrease\n", + "\n", + "These are the customers whose rental activity decreased the most from May to June." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "25a22125", + "metadata": {}, + "outputs": [], + "source": [ + "top_decrease = comparison_df.sort_values(by=\"difference\", ascending=True).head(10)\n", + "top_decrease" + ] + }, + { + "cell_type": "markdown", + "id": "16008747", + "metadata": {}, + "source": [ + "## 13. Average Rental Activity\n", + "\n", + "We can also compare the average number of rentals per customer in both months." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "b9f2531f", + "metadata": {}, + "outputs": [], + "source": [ + "may_col = \"rentals_05_2005\"\n", + "june_col = \"rentals_06_2005\"\n", + "\n", + "average_activity = pd.DataFrame({\n", + " \"month\": [\"May 2005\", \"June 2005\"],\n", + " \"average_rentals_per_customer\": [comparison_df[may_col].mean(), comparison_df[june_col].mean()]\n", + "})\n", + "\n", + "average_activity" + ] + }, + { + "cell_type": "markdown", + "id": "192b59a9", + "metadata": {}, + "source": [ + "### Insight\n", + "\n", + "This helps us understand whether the same group of customers became more or less active on average in June compared to May." + ] + }, + { + "cell_type": "markdown", + "id": "0ea4d7e8", + "metadata": {}, + "source": [ + "## 14. Optional Visualization\n", + "\n", + "A simple bar chart can help visualize how many customers increased, decreased, or maintained their rental activity." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "6aa720bb", + "metadata": {}, + "outputs": [], + "source": [ + "import matplotlib.pyplot as plt\n", + "\n", + "plt.figure(figsize=(8, 5))\n", + "plt.bar(activity_summary[\"activity_change\"], activity_summary[\"number_of_customers\"])\n", + "plt.title(\"Customer Rental Activity Change: May vs June 2005\")\n", + "plt.xlabel(\"Activity Change\")\n", + "plt.ylabel(\"Number of Customers\")\n", + "plt.xticks(rotation=20)\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "60e8bb9f", + "metadata": {}, + "source": [ + "## 15. Final Conclusions\n", + "\n", + "Based on the comparison between May and June 2005:\n", + "\n", + "- We identified customers who were active in both months using an inner merge.\n", + "- We calculated how many rentals each customer made in May and June.\n", + "- We created a `difference` column to measure how customer activity changed.\n", + "- Customers with a positive difference were more active in June.\n", + "- Customers with a negative difference were more active in May.\n", + "- Customers with a difference of zero maintained the same rental behavior.\n", + "\n", + "## Business Interpretation\n", + "\n", + "This type of analysis is useful because it helps a company understand customer engagement over time.\n", + "\n", + "For a movie rental business like Sakila, this could help identify:\n", + "\n", + "- Customers becoming more engaged.\n", + "- Customers whose activity is declining.\n", + "- Opportunities for retention campaigns.\n", + "- Customers who may respond well to loyalty programs or personalized recommendations." + ] + }, + { + "cell_type": "markdown", + "id": "871c0369", + "metadata": {}, + "source": [ + "## 16. Key Takeaway\n", + "\n", + "Connecting Python to SQL allows us to combine the strengths of both tools:\n", + "\n", + "- SQL is useful for retrieving structured data from the database.\n", + "- Python and Pandas are useful for transforming, analyzing, and visualizing that data.\n", + "\n", + "This workflow is very common in data analytics projects." + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.14.3" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/lab-sql-python-connection.ipynb b/lab-sql-python-connection.ipynb new file mode 100644 index 0000000..35bdb67 --- /dev/null +++ b/lab-sql-python-connection.ipynb @@ -0,0 +1,616 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "eccf68ee", + "metadata": {}, + "source": [ + "# lab-sql-python-connection" + ] + }, + { + "cell_type": "markdown", + "id": "59e00018", + "metadata": {}, + "source": [ + "## 1. Import Libraries\n", + "\n", + "We will use:\n", + "\n", + "- `pandas` to work with DataFrames.\n", + "- `sqlalchemy` to create the connection engine.\n", + "- `text` from SQLAlchemy to safely write SQL queries with parameters." + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "5ad93640", + "metadata": {}, + "outputs": [], + "source": [ + "import pandas as pd\n", + "from sqlalchemy import create_engine, text" + ] + }, + { + "cell_type": "markdown", + "id": "4c0b813b", + "metadata": {}, + "source": [ + "## 2. Create the Database Connection\n", + "\n", + "Here we create the connection between Python and the Sakila database.\n", + "\n", + "> Replace `your_password` with your own MySQL password." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "96da5463", + "metadata": {}, + "outputs": [], + "source": [ + "# Database connection settings\n", + "password = \"your_password\" # Replace with your MySQL password\n", + "db = \"sakila\"\n", + "\n", + "# Create the connection string\n", + "connection_string = f\"mysql+pymysql://root:{password}@localhost/{db}\"\n", + "\n", + "# Create the engine\n", + "engine = create_engine(connection_string)" + ] + }, + { + "cell_type": "markdown", + "id": "81d390f6", + "metadata": {}, + "source": [ + "### Test the connection\n", + "\n", + "Before moving forward, it is useful to test if the connection works correctly." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "5ab16ebf", + "metadata": {}, + "outputs": [], + "source": [ + "# Test connection\n", + "query = text(\"SELECT * FROM rental LIMIT 5;\")\n", + "\n", + "sample_rentals = pd.read_sql(query, engine)\n", + "sample_rentals" + ] + }, + { + "cell_type": "markdown", + "id": "b02b4c0a", + "metadata": {}, + "source": [ + "## 3. Function 1: Retrieve Rentals by Month\n", + "\n", + "The function `rentals_month()` retrieves all rental records for a specific month and year.\n", + "\n", + "It receives three parameters:\n", + "\n", + "- `engine`: the database connection engine.\n", + "- `month`: the month we want to analyze.\n", + "- `year`: the year we want to analyze.\n", + "\n", + "The function returns a Pandas DataFrame." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "4232b247", + "metadata": {}, + "outputs": [], + "source": [ + "def rentals_month(engine, month, year):\n", + " \"\"\"\n", + " Retrieves rental data for a specific month and year from the Sakila database.\n", + " \n", + " Parameters:\n", + " engine: SQLAlchemy engine used to connect to the database.\n", + " month: Integer representing the month.\n", + " year: Integer representing the year.\n", + " \n", + " Returns:\n", + " A pandas DataFrame with rental data for the selected month and year.\n", + " \"\"\"\n", + " \n", + " query = text(\"\"\"\n", + " SELECT *\n", + " FROM rental\n", + " WHERE MONTH(rental_date) = :month\n", + " AND YEAR(rental_date) = :year;\n", + " \"\"\")\n", + " \n", + " df = pd.read_sql(query, engine, params={\"month\": month, \"year\": year})\n", + " \n", + " return df" + ] + }, + { + "cell_type": "markdown", + "id": "30bc8a27", + "metadata": {}, + "source": [ + "## 4. Retrieve May and June 2005 Rentals\n", + "\n", + "According to the challenge, we need to analyze customers who were active in both May and June." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "6328647f", + "metadata": {}, + "outputs": [], + "source": [ + "# Retrieve rental data for May and June 2005\n", + "rentals_may = rentals_month(engine, 5, 2005)\n", + "rentals_june = rentals_month(engine, 6, 2005)\n", + "\n", + "print(\"May rentals shape:\", rentals_may.shape)\n", + "print(\"June rentals shape:\", rentals_june.shape)" + ] + }, + { + "cell_type": "markdown", + "id": "31b3194e", + "metadata": {}, + "source": [ + "### Preview May rentals" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "2e46dba7", + "metadata": {}, + "outputs": [], + "source": [ + "rentals_may.head()" + ] + }, + { + "cell_type": "markdown", + "id": "98a67688", + "metadata": {}, + "source": [ + "### Preview June rentals" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c3f09fd2", + "metadata": {}, + "outputs": [], + "source": [ + "rentals_june.head()" + ] + }, + { + "cell_type": "markdown", + "id": "f210e167", + "metadata": {}, + "source": [ + "## 5. Basic Exploration\n", + "\n", + "Before comparing customers, we can quickly check how many rental transactions happened in each month and how many unique customers were active." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "301d370d", + "metadata": {}, + "outputs": [], + "source": [ + "may_total_rentals = rentals_may.shape[0]\n", + "june_total_rentals = rentals_june.shape[0]\n", + "\n", + "may_unique_customers = rentals_may[\"customer_id\"].nunique()\n", + "june_unique_customers = rentals_june[\"customer_id\"].nunique()\n", + "\n", + "summary = pd.DataFrame({\n", + " \"month\": [\"May 2005\", \"June 2005\"],\n", + " \"total_rentals\": [may_total_rentals, june_total_rentals],\n", + " \"unique_customers\": [may_unique_customers, june_unique_customers]\n", + "})\n", + "\n", + "summary" + ] + }, + { + "cell_type": "markdown", + "id": "a3327298", + "metadata": {}, + "source": [ + "### Initial Insight\n", + "\n", + "This summary helps us understand the overall activity in each month before going into the customer-level comparison.\n", + "\n", + "If June has more rentals or more active customers than May, it may suggest an increase in customer activity." + ] + }, + { + "cell_type": "markdown", + "id": "3cc96811", + "metadata": {}, + "source": [ + "## 6. Function 2: Count Rentals by Customer and Month\n", + "\n", + "The function `rental_count_month()` receives the rental DataFrame for one month and counts how many rentals each customer made.\n", + "\n", + "The new column name is created dynamically using the month and year.\n", + "\n", + "For example:\n", + "\n", + "- May 2005 → `rentals_05_2005`\n", + "- June 2005 → `rentals_06_2005`" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "7aaa8e7c", + "metadata": {}, + "outputs": [], + "source": [ + "def rental_count_month(df, month, year):\n", + " \"\"\"\n", + " Counts the number of rentals made by each customer during a selected month and year.\n", + " \n", + " Parameters:\n", + " df: DataFrame returned by rentals_month().\n", + " month: Integer representing the month.\n", + " year: Integer representing the year.\n", + " \n", + " Returns:\n", + " A DataFrame with customer_id and the number of rentals for that month.\n", + " \"\"\"\n", + " \n", + " column_name = f\"rentals_{month:02d}_{year}\"\n", + " \n", + " rental_count = (\n", + " df.groupby(\"customer_id\")\n", + " .size()\n", + " .reset_index(name=column_name)\n", + " )\n", + " \n", + " return rental_count" + ] + }, + { + "cell_type": "markdown", + "id": "b266b031", + "metadata": {}, + "source": [ + "## 7. Count Rentals for May and June" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ee18607e", + "metadata": {}, + "outputs": [], + "source": [ + "rentals_may_count = rental_count_month(rentals_may, 5, 2005)\n", + "rentals_june_count = rental_count_month(rentals_june, 6, 2005)\n", + "\n", + "rentals_may_count.head()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "0b1baf9c", + "metadata": {}, + "outputs": [], + "source": [ + "rentals_june_count.head()" + ] + }, + { + "cell_type": "markdown", + "id": "e46ea476", + "metadata": {}, + "source": [ + "## 8. Function 3: Compare Rentals Between Two Months\n", + "\n", + "The function `compare_rentals()` combines both monthly rental count DataFrames.\n", + "\n", + "We use an `inner` merge because the challenge asks for customers who were active in both months.\n", + "\n", + "Then we calculate:\n", + "\n", + "```text\n", + "difference = rentals_06_2005 - rentals_05_2005\n", + "```\n", + "\n", + "This means:\n", + "\n", + "- Positive difference → the customer rented more in June.\n", + "- Negative difference → the customer rented more in May.\n", + "- Zero → the customer had the same number of rentals in both months." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c14d1813", + "metadata": {}, + "outputs": [], + "source": [ + "def compare_rentals(df1, df2):\n", + " \"\"\"\n", + " Combines two monthly rental count DataFrames and calculates the difference\n", + " between the number of rentals in the second month and the first month.\n", + " \n", + " Parameters:\n", + " df1: Rental count DataFrame for the first month.\n", + " df2: Rental count DataFrame for the second month.\n", + " \n", + " Returns:\n", + " A combined DataFrame with a difference column.\n", + " \"\"\"\n", + " \n", + " comparison = pd.merge(df1, df2, on=\"customer_id\", how=\"inner\")\n", + " \n", + " first_month_col = comparison.columns[1]\n", + " second_month_col = comparison.columns[2]\n", + " \n", + " comparison[\"difference\"] = comparison[second_month_col] - comparison[first_month_col]\n", + " \n", + " return comparison" + ] + }, + { + "cell_type": "markdown", + "id": "744192fb", + "metadata": {}, + "source": [ + "## 9. Compare May vs June Activity" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "6b46bde7", + "metadata": {}, + "outputs": [], + "source": [ + "comparison_df = compare_rentals(rentals_may_count, rentals_june_count)\n", + "\n", + "comparison_df.head()" + ] + }, + { + "cell_type": "markdown", + "id": "5b796c63", + "metadata": {}, + "source": [ + "## 10. Analyze the Results\n", + "\n", + "Now that we have the comparison DataFrame, we can extract useful insights." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "8c118436", + "metadata": {}, + "outputs": [], + "source": [ + "# Number of customers active in both months\n", + "active_both_months = comparison_df[\"customer_id\"].nunique()\n", + "\n", + "# Customers who increased, decreased, or maintained activity\n", + "increased_activity = comparison_df[comparison_df[\"difference\"] > 0].shape[0]\n", + "decreased_activity = comparison_df[comparison_df[\"difference\"] < 0].shape[0]\n", + "same_activity = comparison_df[comparison_df[\"difference\"] == 0].shape[0]\n", + "\n", + "activity_summary = pd.DataFrame({\n", + " \"activity_change\": [\"Increased in June\", \"Decreased in June\", \"Same activity\"],\n", + " \"number_of_customers\": [increased_activity, decreased_activity, same_activity]\n", + "})\n", + "\n", + "activity_summary" + ] + }, + { + "cell_type": "markdown", + "id": "36a1523b", + "metadata": {}, + "source": [ + "### Insight\n", + "\n", + "This table shows how customer behavior changed between May and June.\n", + "\n", + "It helps us identify whether customer engagement increased, decreased, or stayed stable among customers who were active in both months." + ] + }, + { + "cell_type": "markdown", + "id": "3113f318", + "metadata": {}, + "source": [ + "## 11. Top Customers with the Biggest Increase\n", + "\n", + "These are the customers whose rental activity increased the most from May to June." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "11179e6e", + "metadata": {}, + "outputs": [], + "source": [ + "top_increase = comparison_df.sort_values(by=\"difference\", ascending=False).head(10)\n", + "top_increase" + ] + }, + { + "cell_type": "markdown", + "id": "6ef1bede", + "metadata": {}, + "source": [ + "## 12. Customers with the Biggest Decrease\n", + "\n", + "These are the customers whose rental activity decreased the most from May to June." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "25a22125", + "metadata": {}, + "outputs": [], + "source": [ + "top_decrease = comparison_df.sort_values(by=\"difference\", ascending=True).head(10)\n", + "top_decrease" + ] + }, + { + "cell_type": "markdown", + "id": "16008747", + "metadata": {}, + "source": [ + "## 13. Average Rental Activity\n", + "\n", + "We can also compare the average number of rentals per customer in both months." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "b9f2531f", + "metadata": {}, + "outputs": [], + "source": [ + "may_col = \"rentals_05_2005\"\n", + "june_col = \"rentals_06_2005\"\n", + "\n", + "average_activity = pd.DataFrame({\n", + " \"month\": [\"May 2005\", \"June 2005\"],\n", + " \"average_rentals_per_customer\": [comparison_df[may_col].mean(), comparison_df[june_col].mean()]\n", + "})\n", + "\n", + "average_activity" + ] + }, + { + "cell_type": "markdown", + "id": "192b59a9", + "metadata": {}, + "source": [ + "### Insight\n", + "\n", + "This helps us understand whether the same group of customers became more or less active on average in June compared to May." + ] + }, + { + "cell_type": "markdown", + "id": "0ea4d7e8", + "metadata": {}, + "source": [ + "## 14. Optional Visualization\n", + "\n", + "A simple bar chart can help visualize how many customers increased, decreased, or maintained their rental activity." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "6aa720bb", + "metadata": {}, + "outputs": [], + "source": [ + "import matplotlib.pyplot as plt\n", + "\n", + "plt.figure(figsize=(8, 5))\n", + "plt.bar(activity_summary[\"activity_change\"], activity_summary[\"number_of_customers\"])\n", + "plt.title(\"Customer Rental Activity Change: May vs June 2005\")\n", + "plt.xlabel(\"Activity Change\")\n", + "plt.ylabel(\"Number of Customers\")\n", + "plt.xticks(rotation=20)\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "60e8bb9f", + "metadata": {}, + "source": [ + "## 15. Final Conclusions\n", + "\n", + "Based on the comparison between May and June 2005:\n", + "\n", + "- We identified customers who were active in both months using an inner merge.\n", + "- We calculated how many rentals each customer made in May and June.\n", + "- We created a `difference` column to measure how customer activity changed.\n", + "- Customers with a positive difference were more active in June.\n", + "- Customers with a negative difference were more active in May.\n", + "- Customers with a difference of zero maintained the same rental behavior.\n", + "\n", + "## Business Interpretation\n", + "\n", + "This type of analysis is useful because it helps a company understand customer engagement over time.\n", + "\n", + "For a movie rental business like Sakila, this could help identify:\n", + "\n", + "- Customers becoming more engaged.\n", + "- Customers whose activity is declining.\n", + "- Opportunities for retention campaigns.\n", + "- Customers who may respond well to loyalty programs or personalized recommendations." + ] + }, + { + "cell_type": "markdown", + "id": "871c0369", + "metadata": {}, + "source": [ + "## 16. Key Takeaway\n", + "\n", + "Connecting Python to SQL allows us to combine the strengths of both tools:\n", + "\n", + "- SQL is useful for retrieving structured data from the database.\n", + "- Python and Pandas are useful for transforming, analyzing, and visualizing that data.\n", + "\n", + "This workflow is very common in data analytics projects." + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.14.3" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} From e20ce37f471f943b1e7667d48376975ae2872edd Mon Sep 17 00:00:00 2001 From: estefaniabarrosa Date: Tue, 19 May 2026 11:41:45 +0100 Subject: [PATCH 2/2] remove notebook checkpoints --- .gitignore | 1 + ...lab-sql-python-connection-checkpoint.ipynb | 616 ------------------ 2 files changed, 1 insertion(+), 616 deletions(-) create mode 100644 .gitignore delete mode 100644 .ipynb_checkpoints/lab-sql-python-connection-checkpoint.ipynb diff --git a/.gitignore b/.gitignore new file mode 100644 index 0000000..87620ac --- /dev/null +++ b/.gitignore @@ -0,0 +1 @@ +.ipynb_checkpoints/ diff --git a/.ipynb_checkpoints/lab-sql-python-connection-checkpoint.ipynb b/.ipynb_checkpoints/lab-sql-python-connection-checkpoint.ipynb deleted file mode 100644 index 35bdb67..0000000 --- a/.ipynb_checkpoints/lab-sql-python-connection-checkpoint.ipynb +++ /dev/null @@ -1,616 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "id": "eccf68ee", - "metadata": {}, - "source": [ - "# lab-sql-python-connection" - ] - }, - { - "cell_type": "markdown", - "id": "59e00018", - "metadata": {}, - "source": [ - "## 1. Import Libraries\n", - "\n", - "We will use:\n", - "\n", - "- `pandas` to work with DataFrames.\n", - "- `sqlalchemy` to create the connection engine.\n", - "- `text` from SQLAlchemy to safely write SQL queries with parameters." - ] - }, - { - "cell_type": "code", - "execution_count": 1, - "id": "5ad93640", - "metadata": {}, - "outputs": [], - "source": [ - "import pandas as pd\n", - "from sqlalchemy import create_engine, text" - ] - }, - { - "cell_type": "markdown", - "id": "4c0b813b", - "metadata": {}, - "source": [ - "## 2. Create the Database Connection\n", - "\n", - "Here we create the connection between Python and the Sakila database.\n", - "\n", - "> Replace `your_password` with your own MySQL password." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "96da5463", - "metadata": {}, - "outputs": [], - "source": [ - "# Database connection settings\n", - "password = \"your_password\" # Replace with your MySQL password\n", - "db = \"sakila\"\n", - "\n", - "# Create the connection string\n", - "connection_string = f\"mysql+pymysql://root:{password}@localhost/{db}\"\n", - "\n", - "# Create the engine\n", - "engine = create_engine(connection_string)" - ] - }, - { - "cell_type": "markdown", - "id": "81d390f6", - "metadata": {}, - "source": [ - "### Test the connection\n", - "\n", - "Before moving forward, it is useful to test if the connection works correctly." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "5ab16ebf", - "metadata": {}, - "outputs": [], - "source": [ - "# Test connection\n", - "query = text(\"SELECT * FROM rental LIMIT 5;\")\n", - "\n", - "sample_rentals = pd.read_sql(query, engine)\n", - "sample_rentals" - ] - }, - { - "cell_type": "markdown", - "id": "b02b4c0a", - "metadata": {}, - "source": [ - "## 3. Function 1: Retrieve Rentals by Month\n", - "\n", - "The function `rentals_month()` retrieves all rental records for a specific month and year.\n", - "\n", - "It receives three parameters:\n", - "\n", - "- `engine`: the database connection engine.\n", - "- `month`: the month we want to analyze.\n", - "- `year`: the year we want to analyze.\n", - "\n", - "The function returns a Pandas DataFrame." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "4232b247", - "metadata": {}, - "outputs": [], - "source": [ - "def rentals_month(engine, month, year):\n", - " \"\"\"\n", - " Retrieves rental data for a specific month and year from the Sakila database.\n", - " \n", - " Parameters:\n", - " engine: SQLAlchemy engine used to connect to the database.\n", - " month: Integer representing the month.\n", - " year: Integer representing the year.\n", - " \n", - " Returns:\n", - " A pandas DataFrame with rental data for the selected month and year.\n", - " \"\"\"\n", - " \n", - " query = text(\"\"\"\n", - " SELECT *\n", - " FROM rental\n", - " WHERE MONTH(rental_date) = :month\n", - " AND YEAR(rental_date) = :year;\n", - " \"\"\")\n", - " \n", - " df = pd.read_sql(query, engine, params={\"month\": month, \"year\": year})\n", - " \n", - " return df" - ] - }, - { - "cell_type": "markdown", - "id": "30bc8a27", - "metadata": {}, - "source": [ - "## 4. Retrieve May and June 2005 Rentals\n", - "\n", - "According to the challenge, we need to analyze customers who were active in both May and June." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "6328647f", - "metadata": {}, - "outputs": [], - "source": [ - "# Retrieve rental data for May and June 2005\n", - "rentals_may = rentals_month(engine, 5, 2005)\n", - "rentals_june = rentals_month(engine, 6, 2005)\n", - "\n", - "print(\"May rentals shape:\", rentals_may.shape)\n", - "print(\"June rentals shape:\", rentals_june.shape)" - ] - }, - { - "cell_type": "markdown", - "id": "31b3194e", - "metadata": {}, - "source": [ - "### Preview May rentals" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "2e46dba7", - "metadata": {}, - "outputs": [], - "source": [ - "rentals_may.head()" - ] - }, - { - "cell_type": "markdown", - "id": "98a67688", - "metadata": {}, - "source": [ - "### Preview June rentals" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "c3f09fd2", - "metadata": {}, - "outputs": [], - "source": [ - "rentals_june.head()" - ] - }, - { - "cell_type": "markdown", - "id": "f210e167", - "metadata": {}, - "source": [ - "## 5. Basic Exploration\n", - "\n", - "Before comparing customers, we can quickly check how many rental transactions happened in each month and how many unique customers were active." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "301d370d", - "metadata": {}, - "outputs": [], - "source": [ - "may_total_rentals = rentals_may.shape[0]\n", - "june_total_rentals = rentals_june.shape[0]\n", - "\n", - "may_unique_customers = rentals_may[\"customer_id\"].nunique()\n", - "june_unique_customers = rentals_june[\"customer_id\"].nunique()\n", - "\n", - "summary = pd.DataFrame({\n", - " \"month\": [\"May 2005\", \"June 2005\"],\n", - " \"total_rentals\": [may_total_rentals, june_total_rentals],\n", - " \"unique_customers\": [may_unique_customers, june_unique_customers]\n", - "})\n", - "\n", - "summary" - ] - }, - { - "cell_type": "markdown", - "id": "a3327298", - "metadata": {}, - "source": [ - "### Initial Insight\n", - "\n", - "This summary helps us understand the overall activity in each month before going into the customer-level comparison.\n", - "\n", - "If June has more rentals or more active customers than May, it may suggest an increase in customer activity." - ] - }, - { - "cell_type": "markdown", - "id": "3cc96811", - "metadata": {}, - "source": [ - "## 6. Function 2: Count Rentals by Customer and Month\n", - "\n", - "The function `rental_count_month()` receives the rental DataFrame for one month and counts how many rentals each customer made.\n", - "\n", - "The new column name is created dynamically using the month and year.\n", - "\n", - "For example:\n", - "\n", - "- May 2005 → `rentals_05_2005`\n", - "- June 2005 → `rentals_06_2005`" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "7aaa8e7c", - "metadata": {}, - "outputs": [], - "source": [ - "def rental_count_month(df, month, year):\n", - " \"\"\"\n", - " Counts the number of rentals made by each customer during a selected month and year.\n", - " \n", - " Parameters:\n", - " df: DataFrame returned by rentals_month().\n", - " month: Integer representing the month.\n", - " year: Integer representing the year.\n", - " \n", - " Returns:\n", - " A DataFrame with customer_id and the number of rentals for that month.\n", - " \"\"\"\n", - " \n", - " column_name = f\"rentals_{month:02d}_{year}\"\n", - " \n", - " rental_count = (\n", - " df.groupby(\"customer_id\")\n", - " .size()\n", - " .reset_index(name=column_name)\n", - " )\n", - " \n", - " return rental_count" - ] - }, - { - "cell_type": "markdown", - "id": "b266b031", - "metadata": {}, - "source": [ - "## 7. Count Rentals for May and June" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "ee18607e", - "metadata": {}, - "outputs": [], - "source": [ - "rentals_may_count = rental_count_month(rentals_may, 5, 2005)\n", - "rentals_june_count = rental_count_month(rentals_june, 6, 2005)\n", - "\n", - "rentals_may_count.head()" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "0b1baf9c", - "metadata": {}, - "outputs": [], - "source": [ - "rentals_june_count.head()" - ] - }, - { - "cell_type": "markdown", - "id": "e46ea476", - "metadata": {}, - "source": [ - "## 8. Function 3: Compare Rentals Between Two Months\n", - "\n", - "The function `compare_rentals()` combines both monthly rental count DataFrames.\n", - "\n", - "We use an `inner` merge because the challenge asks for customers who were active in both months.\n", - "\n", - "Then we calculate:\n", - "\n", - "```text\n", - "difference = rentals_06_2005 - rentals_05_2005\n", - "```\n", - "\n", - "This means:\n", - "\n", - "- Positive difference → the customer rented more in June.\n", - "- Negative difference → the customer rented more in May.\n", - "- Zero → the customer had the same number of rentals in both months." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "c14d1813", - "metadata": {}, - "outputs": [], - "source": [ - "def compare_rentals(df1, df2):\n", - " \"\"\"\n", - " Combines two monthly rental count DataFrames and calculates the difference\n", - " between the number of rentals in the second month and the first month.\n", - " \n", - " Parameters:\n", - " df1: Rental count DataFrame for the first month.\n", - " df2: Rental count DataFrame for the second month.\n", - " \n", - " Returns:\n", - " A combined DataFrame with a difference column.\n", - " \"\"\"\n", - " \n", - " comparison = pd.merge(df1, df2, on=\"customer_id\", how=\"inner\")\n", - " \n", - " first_month_col = comparison.columns[1]\n", - " second_month_col = comparison.columns[2]\n", - " \n", - " comparison[\"difference\"] = comparison[second_month_col] - comparison[first_month_col]\n", - " \n", - " return comparison" - ] - }, - { - "cell_type": "markdown", - "id": "744192fb", - "metadata": {}, - "source": [ - "## 9. Compare May vs June Activity" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "6b46bde7", - "metadata": {}, - "outputs": [], - "source": [ - "comparison_df = compare_rentals(rentals_may_count, rentals_june_count)\n", - "\n", - "comparison_df.head()" - ] - }, - { - "cell_type": "markdown", - "id": "5b796c63", - "metadata": {}, - "source": [ - "## 10. Analyze the Results\n", - "\n", - "Now that we have the comparison DataFrame, we can extract useful insights." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "8c118436", - "metadata": {}, - "outputs": [], - "source": [ - "# Number of customers active in both months\n", - "active_both_months = comparison_df[\"customer_id\"].nunique()\n", - "\n", - "# Customers who increased, decreased, or maintained activity\n", - "increased_activity = comparison_df[comparison_df[\"difference\"] > 0].shape[0]\n", - "decreased_activity = comparison_df[comparison_df[\"difference\"] < 0].shape[0]\n", - "same_activity = comparison_df[comparison_df[\"difference\"] == 0].shape[0]\n", - "\n", - "activity_summary = pd.DataFrame({\n", - " \"activity_change\": [\"Increased in June\", \"Decreased in June\", \"Same activity\"],\n", - " \"number_of_customers\": [increased_activity, decreased_activity, same_activity]\n", - "})\n", - "\n", - "activity_summary" - ] - }, - { - "cell_type": "markdown", - "id": "36a1523b", - "metadata": {}, - "source": [ - "### Insight\n", - "\n", - "This table shows how customer behavior changed between May and June.\n", - "\n", - "It helps us identify whether customer engagement increased, decreased, or stayed stable among customers who were active in both months." - ] - }, - { - "cell_type": "markdown", - "id": "3113f318", - "metadata": {}, - "source": [ - "## 11. Top Customers with the Biggest Increase\n", - "\n", - "These are the customers whose rental activity increased the most from May to June." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "11179e6e", - "metadata": {}, - "outputs": [], - "source": [ - "top_increase = comparison_df.sort_values(by=\"difference\", ascending=False).head(10)\n", - "top_increase" - ] - }, - { - "cell_type": "markdown", - "id": "6ef1bede", - "metadata": {}, - "source": [ - "## 12. Customers with the Biggest Decrease\n", - "\n", - "These are the customers whose rental activity decreased the most from May to June." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "25a22125", - "metadata": {}, - "outputs": [], - "source": [ - "top_decrease = comparison_df.sort_values(by=\"difference\", ascending=True).head(10)\n", - "top_decrease" - ] - }, - { - "cell_type": "markdown", - "id": "16008747", - "metadata": {}, - "source": [ - "## 13. Average Rental Activity\n", - "\n", - "We can also compare the average number of rentals per customer in both months." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "b9f2531f", - "metadata": {}, - "outputs": [], - "source": [ - "may_col = \"rentals_05_2005\"\n", - "june_col = \"rentals_06_2005\"\n", - "\n", - "average_activity = pd.DataFrame({\n", - " \"month\": [\"May 2005\", \"June 2005\"],\n", - " \"average_rentals_per_customer\": [comparison_df[may_col].mean(), comparison_df[june_col].mean()]\n", - "})\n", - "\n", - "average_activity" - ] - }, - { - "cell_type": "markdown", - "id": "192b59a9", - "metadata": {}, - "source": [ - "### Insight\n", - "\n", - "This helps us understand whether the same group of customers became more or less active on average in June compared to May." - ] - }, - { - "cell_type": "markdown", - "id": "0ea4d7e8", - "metadata": {}, - "source": [ - "## 14. Optional Visualization\n", - "\n", - "A simple bar chart can help visualize how many customers increased, decreased, or maintained their rental activity." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "6aa720bb", - "metadata": {}, - "outputs": [], - "source": [ - "import matplotlib.pyplot as plt\n", - "\n", - "plt.figure(figsize=(8, 5))\n", - "plt.bar(activity_summary[\"activity_change\"], activity_summary[\"number_of_customers\"])\n", - "plt.title(\"Customer Rental Activity Change: May vs June 2005\")\n", - "plt.xlabel(\"Activity Change\")\n", - "plt.ylabel(\"Number of Customers\")\n", - "plt.xticks(rotation=20)\n", - "plt.show()" - ] - }, - { - "cell_type": "markdown", - "id": "60e8bb9f", - "metadata": {}, - "source": [ - "## 15. Final Conclusions\n", - "\n", - "Based on the comparison between May and June 2005:\n", - "\n", - "- We identified customers who were active in both months using an inner merge.\n", - "- We calculated how many rentals each customer made in May and June.\n", - "- We created a `difference` column to measure how customer activity changed.\n", - "- Customers with a positive difference were more active in June.\n", - "- Customers with a negative difference were more active in May.\n", - "- Customers with a difference of zero maintained the same rental behavior.\n", - "\n", - "## Business Interpretation\n", - "\n", - "This type of analysis is useful because it helps a company understand customer engagement over time.\n", - "\n", - "For a movie rental business like Sakila, this could help identify:\n", - "\n", - "- Customers becoming more engaged.\n", - "- Customers whose activity is declining.\n", - "- Opportunities for retention campaigns.\n", - "- Customers who may respond well to loyalty programs or personalized recommendations." - ] - }, - { - "cell_type": "markdown", - "id": "871c0369", - "metadata": {}, - "source": [ - "## 16. Key Takeaway\n", - "\n", - "Connecting Python to SQL allows us to combine the strengths of both tools:\n", - "\n", - "- SQL is useful for retrieving structured data from the database.\n", - "- Python and Pandas are useful for transforming, analyzing, and visualizing that data.\n", - "\n", - "This workflow is very common in data analytics projects." - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3 (ipykernel)", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.14.3" - } - }, - "nbformat": 4, - "nbformat_minor": 5 -}