GDPval
An OpenAI benchmark that measures model performance on economically valuable knowledge work
#GDPval#OpenAI benchmark#knowledge work evaluation#economic-value benchmark
What is GDPval?
GDPval is an OpenAI benchmark that scores models on economically valuable knowledge-work tasks. Unlike standard coding or QA benchmarks, it targets work that resembles what people actually deliver in real jobs.
What does it measure?
It covers multi-step scenarios such as document creation, data analysis, software operation, and tool-switching workflows where the output translates directly into business value.
Why was it introduced?
Where prior benchmarks centered on "can the model solve this problem?", GDPval is meant to signal "can the model finish this real piece of work?" The 84.9% score highlighted at the GPT-5.5 launch is positioning GDPval as a new axis for evaluating agentic workloads.
Related terms
Natural Language Processing
AGI (Artificial General Intelligence)
A hypothetical AI system capable of performing any intellectual task a human can
Natural Language Processing
AI Agent
An autonomous AI system that can plan, use tools, and take actions to achieve goals
Natural Language Processing
Attention
A mechanism that allows AI models to focus on the most relevant parts of the input when producing output
Natural Language Processing
BigLaw Bench
A benchmark for legal-task performance, focusing on document interpretation and reasoning consistency
Natural Language Processing
Chain-of-Thought Elicitation
A prompting method that asks a model to reveal intermediate reasoning steps before the final answer
Natural Language Processing
Chunk
A text segment created by splitting long documents into meaningful units for retrieval and generation