GRPO (Group Relative Policy Optimization)
A reasoning-focused RL method that updates policy by comparing multiple candidate trajectories relatively
#GRPO#Group Relative Policy Optimization#reasoning RL#relative policy optimization
What is GRPO?
GRPO stands for Group Relative Policy Optimization. It improves policy by comparing multiple candidate reasoning trajectories in groups.
Where is it used?
It is often discussed for long-chain reasoning tasks, including coding and math, where intermediate-step quality matters.
Key idea
Instead of relying on a single absolute score, GRPO repeatedly asks which candidates are better relative to peers for the same task.
Related Terms
Related terms
Natural Language Processing
AGI (Artificial General Intelligence)
A hypothetical AI system capable of performing any intellectual task a human can
Natural Language Processing
AI Agent
An autonomous AI system that can plan, use tools, and take actions to achieve goals
Natural Language Processing
Attention
A mechanism that allows AI models to focus on the most relevant parts of the input when producing output
Natural Language Processing
BigLaw Bench
A benchmark for legal-task performance, focusing on document interpretation and reasoning consistency
Natural Language Processing
Chain-of-Thought Elicitation
A prompting method that asks a model to reveal intermediate reasoning steps before the final answer
Natural Language Processing
Chunk
A text segment created by splitting long documents into meaningful units for retrieval and generation