Capability-based compiler/runner for reproducible agent scenarios
-
Updated
May 22, 2026 - Rust
Capability-based compiler/runner for reproducible agent scenarios
Code translator green agent (Judge) of an agent-as-a-judge programming languages translation.
Deterministic offline ComtradeBench judge for evaluating agent robustness under pagination, retries, duplicates, page drift, and totals traps.
Business Process AI Worker · τ²-Bench #1 globally (3/3, 100%) · CRMArenaPro Run 8 · Reflexive Agent Architecture
Fault-injecting OpenEnv training environment for vibe-coded SaaS incidents. 30 scenarios grounded in 2025-26 production failures. Drop-in OpenClaw-RL pool server. Claude Code skill included.
AgentBeats Debate Leaderboard — CellRepair AI Purple Agent (96.5% Win Rate, #1 Ranking)
Baseline purple agent for the ComtradeBench benchmark: UN Comtrade tool-use under adversarial API conditions.
CellRepair AI – AgentX Purple Agent. 3-Layer Fallback. Zero Downtime.
AI-powered clinical triage simulation using Manchester Triage System (MTS). OpenEnv Challenge 2026 entry with A2A protocol support.
Leaderboard infrastructure for the ComtradeBench / AgentBeats agent-evaluation benchmark: task definitions, submission flow, and scoring.
Add a description, image, and links to the agentbeats topic page so that developers can more easily learn about it.
To associate your repository with the agentbeats topic, visit your repo's landing page and select "manage topics."