Skip to content

JaivPatel07/AutoML

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Smart Data Cleaner

Version Backend Status

Smart Data Cleaner is an intelligent dataset preprocessing and cleaning system built with FastAPI and Pandas.

The project automates repetitive data cleaning tasks such as missing value handling, duplicate removal, outlier detection, column analysis, and memory optimization while generating explainable cleaning reports.

It is designed to simplify preprocessing workflows for machine learning and data analytics pipelines.


Project Gallery

1. Data Upload 2. Smart Analysis 3. Cleaning Config
Dashboard Analysis Config
4. Processed Data 5. Visual Statistics 6. Detailed Report
Processing Statistics Final Report

Features

Smart Dataset Analysis

Automatically detects:

  • Constant columns
  • High-null columns
  • ID-like columns
  • High-cardinality columns
  • Statistical outliers

Advanced Data Cleaning

  • Missing value standardization

  • Duplicate row removal

  • Junk value detection

  • Numeric imputation:

    • Mean
    • Median
    • Constant
  • Categorical imputation:

    • Mode
    • Constant
  • IQR-based outlier removal

Explainable Cleaning Reports

Generates detailed reports including:

  • Removed columns with reasons
  • Missing value replacements
  • Outlier statistics
  • Duplicate row information
  • Dataset retention summary
  • Memory optimization details

File Support

  • CSV (.csv)
  • Excel (.xlsx, .xls)

Optimization

  • Automatic datatype downcasting
  • Reduced memory footprint

API Endpoints

Method Endpoint Description
POST /preview Generates dataset preview and auto-detection suggestions
POST /clean Cleans dataset using selected parameters
GET /view/original Returns original dataset
GET /view/cleaned Returns cleaned dataset
GET /view/removed Returns removed rows with reasons
GET /report Returns detailed cleaning report
GET /download Downloads cleaned CSV

Tech Stack

Backend

  • FastAPI
  • Pandas
  • NumPy
  • Python

Data Processing

  • Statistical preprocessing
  • IQR outlier detection
  • Memory optimization
  • Dataset profiling

Installation

Clone Repository

git clone <repository-url>
cd AutoML

Create Virtual Environment

python -m venv venv

Windows

venv\Scripts\activate

Linux / Mac

source venv/bin/activate

Install Dependencies

pip install -r requirements.txt

Run Project

python run.py

Server starts at:

http://localhost:8000

Workflow

  1. Upload dataset
  2. Preview auto-detected issues
  3. Configure cleaning parameters
  4. Run cleaning process
  5. Review reports
  6. Download cleaned dataset

Project Structure

AutoML/
├── app/
│   ├── cleaner/
│   │   └── cleaner.py
│   ├── routes/
│   │   └── clean_routes.py
│   ├── uploads/
│   ├── reports/
│   └── main.py
│
├── assets/
│   └── images/
│
├── static/
│   ├── index.html
│   ├── styles.css
│   └── script.js
│
├── run.py
├── requirements.txt
└── README.md

Why I Built This

Data preprocessing is one of the most repetitive stages in machine learning workflows.

I built Smart Data Cleaner to automate common cleaning operations while keeping the process transparent through explainable reports and structured preprocessing summaries.


Upcoming Features

  • Prediction pipeline integration
  • ML-based cleaning recommendations
  • Exportable PDF reports
  • Advanced dataset profiling
  • Automated preprocessing workflows

Author

Jaiv Patel

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors