Skip to content

epsilon003/DriveFlow

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

4 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

DriveFlow

Serverless Workflow Automation Engine using Google Drive

A production-grade, event-driven workflow orchestration system that uses Google Drive folder transitions as an event bus, eliminating traditional backend infrastructure while maintaining enterprise features like audit logging, RBAC, and failure handling.


Project Highlights

Architecture Innovation:

  • Serverless event-driven pipeline using Google Drive as state machine
  • Zero backend infrastructure (no databases, no servers, no message queues)
  • Folder transitions trigger workflow automation
  • Built-in audit logging via Google Sheets

Technical Sophistication:

  • Workflow orchestration (similar to Airflow/n8n concepts)
  • OAuth 2.0 authentication with token refresh
  • Idempotent file operations with metadata tracking
  • Graceful error handling and retry logic
  • Modular, testable architecture

Real-World Value:

  • Cost: Nearly free for most use cases (Google Drive free tier)
  • Scalability: Can process thousands of files
  • Compliance: Full audit trail of all operations
  • Extensibility: Easy to add custom processing stages

System Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  1_incoming/    β”‚  ← Files uploaded here
β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚ Validation (size, type, naming)
         ↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  2_validated/   β”‚  ← Valid files
β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚ Processing (extraction, transformation)
         ↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  3_processed/   β”‚  ← Processed files
β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚ Approval (automated rules)
         ↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  4_approved/    β”‚  ← Approved files
β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚ Archival
         ↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  5_archived/    β”‚  ← Final storage
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
   β”‚ errors/  β”‚  ← Failed files (any stage)
   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Event Bus: Google Drive folder structure
State Machine: File location = workflow state
Audit Log: Google Sheets with full transaction history
Orchestrator: Python-based workflow engine


Tech Stack

Component Technology
Language Python 3.8+
APIs Google Drive API v3, Google Sheets API v4
Authentication OAuth 2.0 with refresh tokens
Storage Google Drive (file storage + metadata)
Logging Google Sheets (audit trail)
Scheduling Cron / Cloud Scheduler / Task Scheduler
Libraries google-api-python-client, google-auth

Key Concepts Demonstrated

1. Event-Driven Architecture

Files moving between folders act as events that trigger workflow transitions.

2. State Machine Pattern

File location represents current workflow state; transitions move between states.

3. Serverless Computing

No traditional backend - leverages managed APIs for all operations.

4. Audit Logging & Compliance

Every operation logged with timestamp, actor, and result.

5. Idempotency

Operations can be safely retried without side effects.

6. RBAC (Role-Based Access Control)

Google Drive permissions determine who can trigger transitions.


Quick Start

# 1. Install dependencies
pip install google-auth google-auth-oauthlib google-api-python-client python-dotenv

# 2. Set up Google Cloud credentials
# (See SETUP.md for detailed instructions)

# 3. Configure environment
cp .env.template .env
# Edit .env with your folder IDs

# 4. Authenticate
python main.py run

# 5. Check status
python main.py status

See SETUP.md for complete installation guide.


Usage

Run Pipeline Once

python main.py run

Run Continuously (Every 5 Minutes)

python main.py run --continuous

Check Pipeline Status

python main.py status

Output:

════════════════════════════════════════════════════════════
 DriveFlow Pipeline Status
════════════════════════════════════════════════════════════

 INCOMING: 3 files
   - report_2024_Q4.pdf
   - data_export.csv
   - image_001.png

 VALIDATED: 1 files
   - contract_draft.docx

 PROCESSED: 0 files

 APPROVED: 2 files
   - final_report.pdf
   - signed_contract.pdf

 ARCHIVED: 145 files
   - ...

 ERRORS: 1 files
   - invalid_file.exe

View Audit Logs

python main.py logs -n 20

Customization

Add Custom Validation Rules

Edit config.py:

VALIDATION_RULES = {
    'max_size_mb': 100,  # Increase size limit
    'allowed_extensions': ['.pdf', '.docx', '.xlsx'],
    'required_naming_pattern': r'^[A-Z]{3}-\d{4}'  # ABC-1234
}

Add Custom Processing Logic

Edit workflow_engine.py β†’ _process_validated():

def _process_validated(self, file):
    # Example: Extract text from PDF
    content = self.file_ops.download_file_content(file['id'])
    
    # Example: Check for sensitive data
    if 'CONFIDENTIAL' in content.decode('utf-8'):
        self.file_ops.set_file_property(file['id'], 'classification', 'confidential')
    
    # Continue to next stage
    self._move_to_next_stage(file, 'validated', 'processed')

Add Email Notifications

import smtplib
from email.mime.text import MIMEText

def send_notification(file_name, stage):
    msg = MIMEText(f"File {file_name} reached {stage}")
    msg['Subject'] = f'DriveFlow: File Processed'
    msg['From'] = 'driveflow@example.com'
    msg['To'] = 'admin@example.com'
    
    with smtplib.SMTP('smtp.gmail.com', 587) as server:
        server.starttls()
        server.login('your-email', 'your-password')
        server.send_message(msg)

Security & Compliance

  • Authentication: OAuth 2.0 with automatic token refresh
  • Authorization: Leverages Google Drive's built-in RBAC
  • Audit Trail: Every operation logged with timestamp and actor
  • Data Privacy: Files never leave Google's infrastructure
  • Encryption: At rest (Drive) and in transit (HTTPS)

Production Deployment

Cloud Run (Fully Managed)

gcloud run deploy driveflow \
  --source . \
  --region us-central1 \
  --allow-unauthenticated

Cron Job (Self-Hosted)

# /etc/cron.d/driveflow
*/5 * * * * user cd /path/to/driveflow && python main.py run

Docker

FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
CMD ["python", "main.py", "run", "--continuous"]

Testing

# Run unit tests
pytest tests/

# Test authentication
python auth.py

# Test file operations
python file_operations.py

# Test workflow engine
python workflow_engine.py

Project Structure

driveflow/
β”œβ”€β”€ auth.py                 # OAuth 2.0 authentication
β”œβ”€β”€ config.py               # Configuration management
β”œβ”€β”€ audit_logger.py         # Audit logging to Sheets
β”œβ”€β”€ file_operations.py      # Drive file operations
β”œβ”€β”€ workflow_engine.py      # Core workflow orchestration
β”œβ”€β”€ main.py                 # CLI interface
β”œβ”€β”€ credentials.json        # OAuth credentials (gitignore)
β”œβ”€β”€ token.pickle            # Access token (gitignore)
β”œβ”€β”€ .env                    # Configuration (gitignore)
β”œβ”€β”€ .env.template           # Config template
β”œβ”€β”€ requirements.txt        # Python dependencies
β”œβ”€β”€ README.md               # This file
└── SETUP.md                # Detailed setup guide

Contributing

This is a portfolio project, but ideas welcome! Areas for improvement:

  • Add webhook support for real-time processing
  • Implement distributed locking for multi-worker setup
  • Create web-based dashboard
  • Add Slack/email notifications
  • Support workflow definition via YAML
  • Add comprehensive test suite
  • Create Docker compose setup
  • Add metrics and monitoring

License

MIT License - feel free to use for learning or production!


Acknowledgments

Inspired by modern workflow orchestration tools like Airflow, Prefect, and n8n, but reimagined for serverless execution using readily-available APIs.


About

A production-grade, event-driven workflow orchestration system that uses Google Drive folder transitions as an event bus, eliminating traditional backend infrastructure while maintaining enterprise features like audit logging, RBAC, and failure handling.

Resources

Stars

Watchers

Forks

Contributors

Languages