Lifan Yuan 0 关注者 关注 9个月前 How to unlock advanced reasoning via scalable RL? 🚀Introducing PRIME (Process Reinforcement through Implicit Rewards) and Eurus-2, trained from Base model to surpass Qwen2.5-Math-Instruct using only #PRIME #Eurus-2 #ReinforcementLearning #Qwen2.5-Math-Instruct #AdvancedReasoning 前往原网页查看