feat: Add comprehensive abuse detection and prevention system#44
Open
doubledare704 wants to merge 3 commits intopinchbench:mainfrom
Open
feat: Add comprehensive abuse detection and prevention system#44doubledare704 wants to merge 3 commits intopinchbench:mainfrom
doubledare704 wants to merge 3 commits intopinchbench:mainfrom
Conversation
…tion Add comprehensive abuse detection and prevention system: - Implement IP-based rate limiting (50 submissions per 24h) with new submission_rate_limits table - Add heuristic validation for timing (reject impossibly fast submissions) and cost consistency (flag suspicious cost/token ratios) - Add is_flagged and flag_reason columns to submissions table with admin UI to toggle flagged status - Filter flagged submissions from leaderboard and submissions endpoints by default (opt-in via include_flagged query param) test(api): add vitest infrastructure and comprehensive abuse detection tests Add test suite covering rate limiting, heuristic validation, and API filtering: - Set up vitest with better-sqlite3 and miniflare for Cloudflare Workers testing - Add unit tests for security utils (rate limiting, token hashing, IP extraction) - Add unit tests for heuristics (timing validation, cost consistency checks) - Add integration tests for API endpoints (leaderboard filtering, submission handling) - Include MockD1Database wrapper for D1-compatible SQLite testing - Remove remote: true from wrangler.jsonc for local development - Expose is_flagged and flag_reason in submission responses build(db): add rate limiting table and flagging columns to submissions Add migration for abuse detection infrastructure: - Create submission_rate_limits table with token_id, ip, and timestamp tracking - Add indexes on token_id, ip, and created_at for efficient querying - Add is_flagged and flag_reason columns to submissions table - Add index on is_flagged for filtering suspicious submissions refactor(test): extract schema loader to eliminate duplication across test files Centralize database schema initialization by creating a shared schema-loader module. Replace inline SQL in test/setup.ts, test/api.test.ts, and test/security.test.ts with initDbFromSchema() that loads from the main schema.sql file. Also add missing @types/better-sqlite3 dependency and clean up mock-db.ts unused methods.
Code Review SummaryStatus: No Issues Found | Recommendation: Merge All three previously flagged issues have been addressed across the follow-up commits. The abuse detection system is well-structured and the fixes are correct. ✅ All Issues Resolved
Files Reviewed
Reviewed by claude-sonnet-4.6 · 327,009 tokens |
…se prevention
BREAKING SECURITY FIX - Addresses code review issues from kilo-code-bot
Problems Fixed:
- **Race Condition (TOCTOU)**: Two concurrent requests could both pass the check
and exceed the 50-submission limit. Fixed by implementing atomic INSERT-then-COUNT.
- **Independent Token/IP Limits**: Changed `checkSubmissionLimit` to check token and IP
limits separately (already had OR removed, now properly decoupled).
Changes:
- Refactored `recordSubmissionAttempt()` to atomically:
1. Insert the rate limit record first
2. Query the limit counts after insertion
3. Return allowed status and remaining quota
- Moved rate limit gate from pre-flight check (before payload validation) to
post-validation stage, immediately before database insert. This ensures:
- Invalid submissions don't consume quota
- Atomic check-and-insert prevents race conditions
- Updated `checkSubmissionLimit()` as deprecated API (kept for backward compatibility).
New code uses `recordSubmissionAttempt()` for actual enforcement.
- Fixed tests:
- Updated "should track submissions by both token and IP independently" to verify
that quota exhaustion on one token doesn't block different tokens on same IP
- Updated "should record submission attempts" to validate returned limit status
- Fixed invalid test UUID in rate limit rejection test
All 62 tests passing.
The INSERT-first approach means tokenCount already includes the current request after insertion. Using `<` (less-than) enforced a limit of 49 instead of 50. Changed from: const allowed = tokenCount < limit && ipCount < limit; To: const allowed = tokenCount <= limit && ipCount <= limit; This ensures exactly 50 submissions per 24h are allowed: - Request 50: count=50 → 50 <= 50 ✅ allowed - Request 51: count=51 → 51 <= 50 ❌ blocked Fixes critical bug reported by kilo-code-bot in code review. All tests passing (62/62).
|
👋 Hi @doubledare704! I'm @olearycrew's OpenClaw bot doing a triage pass. This PR looks great — the Kilo Code Review flagged zero remaining issues and recommended merge. The abuse detection system with rate limiting, heuristic validation, and admin controls is exactly what we outlined in issue #20. All 62 tests passing, CI green, and mergeable. This one looks ready for @olearycrew to review and merge. Nice work! 🦀 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR implements a robust abuse detection and prevention system for the benchmark API, including rate limiting, heuristic validation, and administrative controls for suspicious submissions.
Key Features
2. IP-based rate limiting: 50 submissions per 24-hour window per token/IP combination
3. Database-backed tracking: New submission_rate_limits table with efficient indexing
4. Automatic enforcement: Submissions exceeding limits are rejected with 429 status
6. Timing validation: Reject submissions with impossibly low execution times (< 0.001s)
7. Cost consistency checks: Flag submissions with suspicious cost/token ratios
8. Automatic flagging: Suspicious submissions are marked for admin review
10. Flagging system: is_flagged and flag_reason columns on submissions
11. Admin UI: Toggle flagged status for submissions
12. Filtered endpoints: Leaderboard and submissions APIs exclude flagged results by default
13. Opt-in viewing: include_flagged=true query parameter to view flagged submissions
15. Vitest setup: Comprehensive test suite with better-sqlite3 and miniflare
16. MockD1Database: SQLite wrapper for D1-compatible testing
17. Unit tests: Security utils, heuristics, and API endpoints
18. Integration tests: Full API workflow testing
Database Changes
New Table: submission_rate_limits
Why added: To track submission attempts per token and IP for rate limiting enforcement. Includes indexes on token_id, [ip], and created_at for efficient querying and cleanup.
Modified Table: submissions
Why added: To mark suspicious submissions for administrative review and filtering. The is_flagged index enables fast filtering of flagged submissions from public endpoints.
Migration Notes
Run 20260408_add_submission_limits_and_flags.sql to add the new table and columns
Existing submissions will have is_flagged = 0 by default
No data loss or breaking changes for existing functionality
Testing
All tests pass (62/62) with the new abuse detection features fully covered. The system maintains backward compatibility while adding comprehensive protection against abuse.
Closes #20