Lifan Yuan2025-01-02 17:18:13How to unlock advanced reasoning via scalable RL? 🚀Introducing PRIME (Process Reinforcement through Implicit Rewards) and Eurus-2, trained from Base model to surpass Qwen2.5-Math-Instruct using only #PRIME#Eurus-2#ReinforcementLearning