Skip to content

feat: Add comprehensive abuse detection and prevention system#44

Open
doubledare704 wants to merge 3 commits intopinchbench:mainfrom
doubledare704:feature/20-abuse-detection-prevention
Open

feat: Add comprehensive abuse detection and prevention system#44
doubledare704 wants to merge 3 commits intopinchbench:mainfrom
doubledare704:feature/20-abuse-detection-prevention

Conversation

@doubledare704
Copy link
Copy Markdown

This PR implements a robust abuse detection and prevention system for the benchmark API, including rate limiting, heuristic validation, and administrative controls for suspicious submissions.

Key Features

  1. 🔒 Rate Limiting
    2. IP-based rate limiting: 50 submissions per 24-hour window per token/IP combination
    3. Database-backed tracking: New submission_rate_limits table with efficient indexing
    4. Automatic enforcement: Submissions exceeding limits are rejected with 429 status
  2. 🧠 Heuristic Validation
    6. Timing validation: Reject submissions with impossibly low execution times (< 0.001s)
    7. Cost consistency checks: Flag submissions with suspicious cost/token ratios
    8. Automatic flagging: Suspicious submissions are marked for admin review
  3. 👨‍💼 Administrative Controls
    10. Flagging system: is_flagged and flag_reason columns on submissions
    11. Admin UI: Toggle flagged status for submissions
    12. Filtered endpoints: Leaderboard and submissions APIs exclude flagged results by default
    13. Opt-in viewing: include_flagged=true query parameter to view flagged submissions
  4. 🧪 Testing Infrastructure
    15. Vitest setup: Comprehensive test suite with better-sqlite3 and miniflare
    16. MockD1Database: SQLite wrapper for D1-compatible testing
    17. Unit tests: Security utils, heuristics, and API endpoints
    18. Integration tests: Full API workflow testing

Database Changes
New Table: submission_rate_limits
Why added: To track submission attempts per token and IP for rate limiting enforcement. Includes indexes on token_id, [ip], and created_at for efficient querying and cleanup.

Modified Table: submissions
Why added: To mark suspicious submissions for administrative review and filtering. The is_flagged index enables fast filtering of flagged submissions from public endpoints.

Migration Notes
Run 20260408_add_submission_limits_and_flags.sql to add the new table and columns
Existing submissions will have is_flagged = 0 by default
No data loss or breaking changes for existing functionality

Testing
All tests pass (62/62) with the new abuse detection features fully covered. The system maintains backward compatibility while adding comprehensive protection against abuse.

Closes #20

…tion

Add comprehensive abuse detection and prevention system:
- Implement IP-based rate limiting (50 submissions per 24h) with new submission_rate_limits table
- Add heuristic validation for timing (reject impossibly fast submissions) and cost consistency (flag suspicious cost/token ratios)
- Add is_flagged and flag_reason columns to submissions table with admin UI to toggle flagged status
- Filter flagged submissions from leaderboard and submissions endpoints by default (opt-in via include_flagged query param)

test(api): add vitest infrastructure and comprehensive abuse detection tests

Add test suite covering rate limiting, heuristic validation, and API filtering:
- Set up vitest with better-sqlite3 and miniflare for Cloudflare Workers testing
- Add unit tests for security utils (rate limiting, token hashing, IP extraction)
- Add unit tests for heuristics (timing validation, cost consistency checks)
- Add integration tests for API endpoints (leaderboard filtering, submission handling)
- Include MockD1Database wrapper for D1-compatible SQLite testing
- Remove remote: true from wrangler.jsonc for local development
- Expose is_flagged and flag_reason in submission responses

build(db): add rate limiting table and flagging columns to submissions

Add migration for abuse detection infrastructure:
- Create submission_rate_limits table with token_id, ip, and timestamp tracking
- Add indexes on token_id, ip, and created_at for efficient querying
- Add is_flagged and flag_reason columns to submissions table
- Add index on is_flagged for filtering suspicious submissions

refactor(test): extract schema loader to eliminate duplication across test files

Centralize database schema initialization by creating a shared schema-loader module.
Replace inline SQL in test/setup.ts, test/api.test.ts, and test/security.test.ts with
initDbFromSchema() that loads from the main schema.sql file. Also add missing
@types/better-sqlite3 dependency and clean up mock-db.ts unused methods.
@kilo-code-bot
Copy link
Copy Markdown

kilo-code-bot bot commented Apr 11, 2026

Code Review Summary

Status: No Issues Found | Recommendation: Merge

All three previously flagged issues have been addressed across the follow-up commits. The abuse detection system is well-structured and the fixes are correct.

✅ All Issues Resolved

File Issue
src/utils/security.ts OR ip = ? DoS vector — fixed by separating token and IP limit queries
src/routes/results.ts TOCTOU race condition — fixed by INSERT-before-check approach
src/utils/security.ts Off-by-one (< limit<= limit) — now correctly allows exactly 50 submissions
Files Reviewed
  • src/utils/security.ts
  • src/routes/results.ts
  • src/utils/heuristics.ts
  • src/routes/admin.ts
  • src/routes/leaderboard.ts
  • src/routes/submissions.ts
  • src/types.ts
  • migrations/20260408_add_submission_limits_and_flags.sql
  • schema.sql
  • test/api.test.ts
  • test/heuristics.test.ts
  • test/security.test.ts
  • test/mock-db.ts
  • test/setup.ts

Reviewed by claude-sonnet-4.6 · 327,009 tokens

…se prevention

BREAKING SECURITY FIX - Addresses code review issues from kilo-code-bot

Problems Fixed:
- **Race Condition (TOCTOU)**: Two concurrent requests could both pass the check
  and exceed the 50-submission limit. Fixed by implementing atomic INSERT-then-COUNT.

- **Independent Token/IP Limits**: Changed `checkSubmissionLimit` to check token and IP
  limits separately (already had OR removed, now properly decoupled).

Changes:
- Refactored `recordSubmissionAttempt()` to atomically:
  1. Insert the rate limit record first
  2. Query the limit counts after insertion
  3. Return allowed status and remaining quota

- Moved rate limit gate from pre-flight check (before payload validation) to
  post-validation stage, immediately before database insert. This ensures:
  - Invalid submissions don't consume quota
  - Atomic check-and-insert prevents race conditions

- Updated `checkSubmissionLimit()` as deprecated API (kept for backward compatibility).
  New code uses `recordSubmissionAttempt()` for actual enforcement.

- Fixed tests:
  - Updated "should track submissions by both token and IP independently" to verify
    that quota exhaustion on one token doesn't block different tokens on same IP
  - Updated "should record submission attempts" to validate returned limit status
  - Fixed invalid test UUID in rate limit rejection test

All 62 tests passing.
The INSERT-first approach means tokenCount already includes the current request
after insertion. Using `<` (less-than) enforced a limit of 49 instead of 50.

Changed from:
  const allowed = tokenCount < limit && ipCount < limit;

To:
  const allowed = tokenCount <= limit && ipCount <= limit;

This ensures exactly 50 submissions per 24h are allowed:
- Request 50: count=50 → 50 <= 50 ✅ allowed
- Request 51: count=51 → 51 <= 50 ❌ blocked

Fixes critical bug reported by kilo-code-bot in code review.
All tests passing (62/62).
@ScuttleBot
Copy link
Copy Markdown

👋 Hi @doubledare704! I'm @olearycrew's OpenClaw bot doing a triage pass.

This PR looks great — the Kilo Code Review flagged zero remaining issues and recommended merge. The abuse detection system with rate limiting, heuristic validation, and admin controls is exactly what we outlined in issue #20.

All 62 tests passing, CI green, and mergeable. This one looks ready for @olearycrew to review and merge.

Nice work! 🦀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Abuse heuristics and rate limiting

2 participants