A browser-based tool for aggregating Fabla research data. Upload multiple CSV files, merge them intelligently, and remove duplicates with precision - all powered by advanced parsing technology.
- π Multiple File Upload: Drag & drop or browse to select multiple CSV files
- π§ ResponseID Deduplication: Automatically removes duplicates based on ResponseID (keeps first occurrence)
- ποΈ Data Preview: Preview your merged data before downloading
- πΎ Multiple Export Formats: Download as CSV or Excel with automatic date stamping
- π Transcript Extraction: Extract Date, Reference, and Transcript columns for rows with Reference data
- π οΈ Advanced CSV Parsing: Handles multiline fields and complex CSV structures
- π Participation Dashboard: Analyze participation patterns with detailed statistics
- π Interactive Charts: Visualize daily participation over time with scrollable bar charts
- π Detailed Tables: View participation data in scrollable tables with all dates
- π Participant Filtering: Filter charts and tables by specific participants or view all
- π Date Range Filtering: Filter data by time periods (last 7 days, 15 days, 1-3 months, or all)
- π Key Metrics: Total participants, entries, averages, most/least active participants
- π― End Time Analysis: Focuses on end_time entries for accurate participation tracking
- π° Incentive Tracking: Displays the most recent incentive value for each participant from Incentives entries
- π§Ή Upload Glitch Detection: Automatically detects duplicate entries caused by data upload glitches
- π Smart Identification: Identifies duplicates by matching PromptID, ParticipantID, and RespondedAt columns
- π Earliest Entry Preserved: Keeps the entry with the earliest upload timestamp (Date) for each duplicate group
- π Full Row Preview: Review detected duplicate groups with all columns displayed for validation
- β‘ Automatic Processing: No configuration needed β upload a file and cleaning runs immediately
- πΎ Export Options: Download cleaned data as CSV or Excel
- π Privacy-First: All processing happens locally in your browser - no data sent to external servers
- π± Mobile-Friendly: Responsive design works on all devices
- π» Fabla Branding: Authentic Fabla design with ghost cursor and professional styling
- Upload Files: Drag and drop CSV files or click to browse. You can upload multiple files at once.
- Review Settings: The tool automatically uses ResponseID for deduplication and keeps the first occurrence of duplicates.
- Process Data: The tool processes your files and removes duplicates automatically.
- Download Results:
- Download CSV/Excel: Download your collated and deduplicated data as CSV or Excel format with automatic date stamping (e.g.,
fabla_data_2024-09-24.csv) - Download Transcripts: Extract and download only the Date, Reference, and Transcript columns for rows that have Reference data. This creates a focused CSV file (e.g.,
fabla_reference_transcript_2024-09-24.csv) containing only relevant transcript entries.
- Download CSV/Excel: Download your collated and deduplicated data as CSV or Excel format with automatic date stamping (e.g.,
- Upload CSV Files: Upload one or more CSV files containing participation data with
end_timeentries. - View Analysis: The tool automatically processes the data and displays:
- Statistics: Total participants, entries, averages, and most/least active participants
- Daily Participation Chart: Interactive bar chart showing participation over time
- Participation Table: Detailed table with daily counts for each participant, including:
- Participant ID: Unique identifier for each participant
- Total Entries: Total number of participation entries
- Incentive: Most recent incentive value from Incentives entries (if available)
- Daily Counts: Participation counts for each date in the dataset
- Filter by Date Range: Use the "Filter by Date" dropdown below the chart to view data for specific time periods:
- All: Shows all available dates
- Last 7 days: Shows only the most recent week
- Last 15 days: Shows the last two weeks
- 1 month: Shows the last 30 days
- 2 months: Shows the last 60 days
- 3 months: Shows the last 90 days
- Participants with no data in the selected range are automatically hidden
- Filter by Participant: Use the dropdown selector to filter both the chart and table by a specific participant or view all participants. Date and participant filters work together.
- Download Results: Download the participation summary as CSV for further analysis.
- Upload File: Upload a CSV file containing data with potential duplicates from upload glitches.
- Automatic Detection: The tool automatically identifies duplicate entries where PromptID, ParticipantID, and RespondedAt all match but the Date (upload timestamp) differs.
- Review Duplicates: Review the detected duplicate groups with full row details β all columns are displayed so you can validate each removal.
- Download Cleaned Data: Download the cleaned dataset as CSV or Excel. The earliest upload timestamp is preserved for each duplicate group.
Note: The file must contain
PromptID,ParticipantID,RespondedAt, andDatecolumns. Rows where any of these values are empty or missing are never flagged as duplicates.
- Pure JavaScript: No server-side processing required
- Advanced CSV Parser: Handles quoted fields, commas, multiline fields, and special characters
- Memory Efficient: Processes files directly in the browser
- Cross-Browser Compatible: Works in all modern browsers
- Fixed Deduplication Logic: Always uses ResponseID for consistent results
- Automatic File Naming: Downloads include current date for organization
- Timezone-Safe Date Parsing: Direct string parsing prevents date shift issues
- Dynamic Date Ranges: Automatically detects and displays all dates present in the data
- Synchronized Filtering: Participant and date filters apply to both charts and tables simultaneously
- Smart Participant Filtering: Automatically hides participants with no data in the selected date range
- Modular Architecture: Extraction and analysis logic separated into reusable modules
- Input: CSV files (
.csv) - Output: CSV (
.csv) and Excel (.xlsx)
To deploy this tool to your own GitHub Pages site:
- Fork this repository or create a new one
- Enable GitHub Pages:
- Go to your repository settings
- Scroll to "Pages" section
- Select "Deploy from a branch"
- Choose "main" branch and "/ (root)" folder
- Click "Save"
- Access your site:
https://your-username.github.io/repository-name/
fabla-data-collation/
βββ index.html # Main HTML interface with Fabla branding
βββ script.js # JavaScript processing logic with CSV parser and participation analysis
βββ participation-analysis.js # Core participation analysis logic
βββ reference-transcript-extractor.js # Reference and transcript extraction module
βββ fabla-icon.png # Fabla logo icon
βββ fabla-ghost.png # Fabla ghost image (used as cursor)
βββ fabla-ghost-cursor.png # Fabla ghost cursor image
βββ fabla-data-collate.py # Original Python CLI tool
βββ README.md # This file
For users who prefer command-line processing or need to process large datasets, this repository also includes a Python CLI tool (fabla-data-collate.py) that provides the same functionality.
-
Clone the repository:
git clone https://github.com/apphatchery/fabla-data-collation.git cd fabla-data-collation -
Install Python dependencies:
pip install pandas
-
Run the Python tool:
# Process all CSV files in a directory python3 fabla-data-collate.py -i ./your_csv_folder -o merged_data.csv # Process specific files python3 fabla-data-collate.py -i file1.csv file2.csv file3.csv -o merged_data.csv # Process recursively through subdirectories python3 fabla-data-collate.py -i ./your_csv_folder --recursive -o merged_data.csv
- Same deduplication logic: Uses ResponseID and keeps first occurrence
- Batch processing: Handle multiple files or entire directories
- Recursive processing: Process subdirectories automatically
- Flexible input: Accept files, directories, or glob patterns
- Memory efficient: Processes large datasets without browser limitations
All data processing happens locally in your browser (web tool) or on your computer (Python tool). No data is sent to external servers, ensuring complete privacy and compliance with research data protection requirements.
The tool is easily customizable:
- Styling: Modify CSS in
index.html(includes Fabla branding) - Functionality: Extend JavaScript in
script.js - Deduplication: Currently fixed to ResponseID (can be modified in code)
- β Chrome 60+
- β Firefox 55+
- β Safari 12+
- β Edge 79+
This project is open source and available under the MIT License.
Contributions are welcome! Please feel free to submit issues and pull requests.
If you encounter any issues or have questions, please open an issue on GitHub.