Serverless Workflow Automation Engine using Google Drive
A production-grade, event-driven workflow orchestration system that uses Google Drive folder transitions as an event bus, eliminating traditional backend infrastructure while maintaining enterprise features like audit logging, RBAC, and failure handling.
Architecture Innovation:
- Serverless event-driven pipeline using Google Drive as state machine
- Zero backend infrastructure (no databases, no servers, no message queues)
- Folder transitions trigger workflow automation
- Built-in audit logging via Google Sheets
Technical Sophistication:
- Workflow orchestration (similar to Airflow/n8n concepts)
- OAuth 2.0 authentication with token refresh
- Idempotent file operations with metadata tracking
- Graceful error handling and retry logic
- Modular, testable architecture
Real-World Value:
- Cost: Nearly free for most use cases (Google Drive free tier)
- Scalability: Can process thousands of files
- Compliance: Full audit trail of all operations
- Extensibility: Easy to add custom processing stages
βββββββββββββββββββ
β 1_incoming/ β β Files uploaded here
ββββββββββ¬βββββββββ
β Validation (size, type, naming)
β
βββββββββββββββββββ
β 2_validated/ β β Valid files
ββββββββββ¬βββββββββ
β Processing (extraction, transformation)
β
βββββββββββββββββββ
β 3_processed/ β β Processed files
ββββββββββ¬βββββββββ
β Approval (automated rules)
β
βββββββββββββββββββ
β 4_approved/ β β Approved files
ββββββββββ¬βββββββββ
β Archival
β
βββββββββββββββββββ
β 5_archived/ β β Final storage
βββββββββββββββββββ
ββββββββββββ
β errors/ β β Failed files (any stage)
ββββββββββββ
Event Bus: Google Drive folder structure
State Machine: File location = workflow state
Audit Log: Google Sheets with full transaction history
Orchestrator: Python-based workflow engine
| Component | Technology |
|---|---|
| Language | Python 3.8+ |
| APIs | Google Drive API v3, Google Sheets API v4 |
| Authentication | OAuth 2.0 with refresh tokens |
| Storage | Google Drive (file storage + metadata) |
| Logging | Google Sheets (audit trail) |
| Scheduling | Cron / Cloud Scheduler / Task Scheduler |
| Libraries | google-api-python-client, google-auth |
Files moving between folders act as events that trigger workflow transitions.
File location represents current workflow state; transitions move between states.
No traditional backend - leverages managed APIs for all operations.
Every operation logged with timestamp, actor, and result.
Operations can be safely retried without side effects.
Google Drive permissions determine who can trigger transitions.
# 1. Install dependencies
pip install google-auth google-auth-oauthlib google-api-python-client python-dotenv
# 2. Set up Google Cloud credentials
# (See SETUP.md for detailed instructions)
# 3. Configure environment
cp .env.template .env
# Edit .env with your folder IDs
# 4. Authenticate
python main.py run
# 5. Check status
python main.py statusSee SETUP.md for complete installation guide.
python main.py runpython main.py run --continuouspython main.py statusOutput:
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
DriveFlow Pipeline Status
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
INCOMING: 3 files
- report_2024_Q4.pdf
- data_export.csv
- image_001.png
VALIDATED: 1 files
- contract_draft.docx
PROCESSED: 0 files
APPROVED: 2 files
- final_report.pdf
- signed_contract.pdf
ARCHIVED: 145 files
- ...
ERRORS: 1 files
- invalid_file.exe
python main.py logs -n 20Edit config.py:
VALIDATION_RULES = {
'max_size_mb': 100, # Increase size limit
'allowed_extensions': ['.pdf', '.docx', '.xlsx'],
'required_naming_pattern': r'^[A-Z]{3}-\d{4}' # ABC-1234
}Edit workflow_engine.py β _process_validated():
def _process_validated(self, file):
# Example: Extract text from PDF
content = self.file_ops.download_file_content(file['id'])
# Example: Check for sensitive data
if 'CONFIDENTIAL' in content.decode('utf-8'):
self.file_ops.set_file_property(file['id'], 'classification', 'confidential')
# Continue to next stage
self._move_to_next_stage(file, 'validated', 'processed')import smtplib
from email.mime.text import MIMEText
def send_notification(file_name, stage):
msg = MIMEText(f"File {file_name} reached {stage}")
msg['Subject'] = f'DriveFlow: File Processed'
msg['From'] = 'driveflow@example.com'
msg['To'] = 'admin@example.com'
with smtplib.SMTP('smtp.gmail.com', 587) as server:
server.starttls()
server.login('your-email', 'your-password')
server.send_message(msg)- Authentication: OAuth 2.0 with automatic token refresh
- Authorization: Leverages Google Drive's built-in RBAC
- Audit Trail: Every operation logged with timestamp and actor
- Data Privacy: Files never leave Google's infrastructure
- Encryption: At rest (Drive) and in transit (HTTPS)
gcloud run deploy driveflow \
--source . \
--region us-central1 \
--allow-unauthenticated# /etc/cron.d/driveflow
*/5 * * * * user cd /path/to/driveflow && python main.py runFROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
CMD ["python", "main.py", "run", "--continuous"]# Run unit tests
pytest tests/
# Test authentication
python auth.py
# Test file operations
python file_operations.py
# Test workflow engine
python workflow_engine.pydriveflow/
βββ auth.py # OAuth 2.0 authentication
βββ config.py # Configuration management
βββ audit_logger.py # Audit logging to Sheets
βββ file_operations.py # Drive file operations
βββ workflow_engine.py # Core workflow orchestration
βββ main.py # CLI interface
βββ credentials.json # OAuth credentials (gitignore)
βββ token.pickle # Access token (gitignore)
βββ .env # Configuration (gitignore)
βββ .env.template # Config template
βββ requirements.txt # Python dependencies
βββ README.md # This file
βββ SETUP.md # Detailed setup guide
This is a portfolio project, but ideas welcome! Areas for improvement:
- Add webhook support for real-time processing
- Implement distributed locking for multi-worker setup
- Create web-based dashboard
- Add Slack/email notifications
- Support workflow definition via YAML
- Add comprehensive test suite
- Create Docker compose setup
- Add metrics and monitoring
MIT License - feel free to use for learning or production!
Inspired by modern workflow orchestration tools like Airflow, Prefect, and n8n, but reimagined for serverless execution using readily-available APIs.