AI Agent that handles engineering tasks end-to-end: integrates with developers’ tools, plans, executes, and iterates until it achieves a successful result.
-
Updated
May 3, 2026 - Rust
AI Agent that handles engineering tasks end-to-end: integrates with developers’ tools, plans, executes, and iterates until it achieves a successful result.
SE-Agent is a self-evolution framework for LLM Code agents. It enables trajectory-level evolution to exchange information across reasoning paths via Revision, Recombination, and Refinement, expanding the search space and escaping local optima. On SWE-bench Verified, it achieves SOTA performance
An LLM council that reviews your coding agent's every move
Squeeze verbose LLM agent tool output down to only the relevant lines
Save up to 40% on agent token costs with code graphs: call graphs, dependency graphs, dead code detection, and blast radius analysis.
Lean orchestration platform for enterprise AI — where each decision costs hundreds. State machine core, HITL as a first-class state, corrections that accumulate. First use-case being Coding agent. Open research, early stage.
Open benchmark for AI coding agents on SWE-bench Verified. Compare resolution rates, cost, and unique wins.
Repository-level automated code repair agent using SWE-Bench dataset
Do MCP tools serialize in Claude Code? Empirical study: readOnlyHint controls parallelism, IPC overhead is ~5ms/call. Reproduces #14353.
SWE-bench for your codebase — mine your merged PRs into local, contamination-free coding-agent benchmarks. Adapters: claude-code, aider (Opus 4.7 / GPT-5.5 / Sonnet 4.6 / Gemini 3.1 Pro).
Build a private evaluation dataset to optimize your organization's token costs.
Benchmark suite for evaluating LLMs and SLMs on coding and SE tasks. Features HumanEval, MBPP, SWE-bench, and BigCodeBench with an interactive Streamlit UI. Supports cloud APIs (OpenAI, Anthropic, Google) and local models via Ollama. Tracks pass rates, latency, token usage, and costs.
A technical guide and live-tracking repository for the world's top AI models, specialized by coding, reasoning, and multimodal performance.
Recursive Formal Alignment Factory — Autonomous pipeline for generating ultra-high-density vericoding trajectories and training SOTA code models on a single T4 GPU. Multi-agent verification, formal proofs, dense rewards, recursive self-improvement.
Evaluate existing candidate patches; built to make benchmark claims auditable, reproducible, and explicitly evaluator-conditioned
A Rust reimplementation of mini-swe-agent with CLI task execution, benchmark runners, trajectory inspection, and multi-environment support.
One-command SWE-bench eval harness in Go. Native ARM64 containers with 6.3x test runner speedup on Apple Silicon and AWS Graviton. Pre-built images on Docker Hub.
Add a description, image, and links to the swe-bench topic page so that developers can more easily learn about it.
To associate your repository with the swe-bench topic, visit your repo's landing page and select "manage topics."