SWE-bench
A software engineering benchmark that measures whether a model can fix real GitHub issues
#SWE-bench#SWE-Bench#SWE-bench Verified#SWE-Bench Pro#coding benchmark
What is SWE-bench?
SWE-bench evaluates whether a model can resolve real issues from open-source repositories. Instead of abstract coding quizzes, it tests repository understanding, patch generation, and test execution success.
How is it measured?
A model receives issue context, generates a patch, and is scored by whether the patch passes the associated tests. This makes SWE-bench closer to practical software maintenance than syntax-only evaluation.
Why does it matter?
In production coding workflows, "looks correct" is not enough. Teams need fixes that actually run and pass tests. SWE-bench helps compare that capability.
Related terms
Natural Language Processing
AGI (Artificial General Intelligence)
A hypothetical AI system capable of performing any intellectual task a human can
Natural Language Processing
AI Agent
An autonomous AI system that can plan, use tools, and take actions to achieve goals
Natural Language Processing
Attention
A mechanism that allows AI models to focus on the most relevant parts of the input when producing output
Natural Language Processing
BigLaw Bench
A benchmark for legal-task performance, focusing on document interpretation and reasoning consistency
Natural Language Processing
Chain-of-Thought Elicitation
A prompting method that asks a model to reveal intermediate reasoning steps before the final answer
Natural Language Processing
Chunk
A text segment created by splitting long documents into meaningful units for retrieval and generation