What is CursorBench?

CursorBench is an internal coding benchmark run by the AI coding IDE Cursor. It compares models across multi-file edits, refactoring, and debugging scenarios drawn from real user workflows.

How is it measured?

Based on work patterns collected inside Cursor, the benchmark scores whether each model's edits match the intended change. Cursor-specific features — such as Composer and inline editing — also factor into the evaluation.

Why does it matter?

Unlike synthetic lab benchmarks, CursorBench draws from real IDE usage, which gives it stronger signal for the practical question "how well will this model perform in my actual tool?" The dataset and rubric are private, so it is best read as a relative comparison between models rather than an absolute score.

What is CursorBench?

How is it measured?

Why does it matter?

Related terms