Skip to main content
Back to List
Natural Language Processing

GRPO (Group Relative Policy Optimization)

A reasoning-focused RL method that updates policy by comparing multiple candidate trajectories relatively

#GRPO#Group Relative Policy Optimization#reasoning RL#relative policy optimization

What is GRPO?

GRPO stands for Group Relative Policy Optimization. It improves policy by comparing multiple candidate reasoning trajectories in groups.

Where is it used?

It is often discussed for long-chain reasoning tasks, including coding and math, where intermediate-step quality matters.

Key idea

Instead of relying on a single absolute score, GRPO repeatedly asks which candidates are better relative to peers for the same task.

Related Terms

Related terms