Skip to main content
Back to List
Natural Language Processing

SWE-bench

A software engineering benchmark that measures whether a model can fix real GitHub issues

#SWE-bench#SWE-Bench#SWE-bench Verified#SWE-Bench Pro#coding benchmark

What is SWE-bench?

SWE-bench evaluates whether a model can resolve real issues from open-source repositories. Instead of abstract coding quizzes, it tests repository understanding, patch generation, and test execution success.

How is it measured?

A model receives issue context, generates a patch, and is scored by whether the patch passes the associated tests. This makes SWE-bench closer to practical software maintenance than syntax-only evaluation.

Why does it matter?

In production coding workflows, "looks correct" is not enough. Teams need fixes that actually run and pass tests. SWE-bench helps compare that capability.

Related terms