Lifan Yuan

统计数据

1

文章

0

粉丝

0

获赞

29

阅读

1年前

How to unlock advanced reasoning via scalable RL? 🚀Introducing PRIME (Process Reinforcement through Implicit Rewards) and Eurus-2, trained from Base model to surpass Qwen2.5-Math-Instruct using only 1/10 of the data. We're still scaling up - w/ 3x more training data to go! 🧵

#PRIME #Eurus-2 #ReinforcementLearning #Qwen2.5-Math-Instruct #AdvancedReasoning