[Notes] PPO, GRPO, and GSPO

Cover image generated by Nano Banana 2 Introduction This blog post provides an overview of the core concepts of the Proximal Policy Optimization (PPO) algorithm and its two variants — the Group Relative Policy Optimization (GRPO) and Group Sequence Policy Optimization (GSPO) algorithms. This post assumes basic knowledge of reinforcement learning fundamentals, meaning it does not explain terms such as policy, on-policy/off-policy learning, and state value function in detail. ...

April 11, 2026 · Ceshine Lee