时政
财经
科技
虚拟货币
其他
登录
#Eurus-2
关注
Lifan Yuan
9个月前
How to unlock advanced reasoning via scalable RL? 🚀Introducing PRIME (Process Reinforcement through Implicit Rewards) and Eurus-2, trained from Base model to surpass Qwen2.5-Math-Instruct using only 1/10 of the data. We're still scaling up - w/ 3x more training data to go! 🧵
#PRIME
#Eurus-2
#ReinforcementLearning
#Qwen2.5-Math-Instruct
#AdvancedReasoning
分享
评论 0
0
个人主页
通知
我的投稿
我的关注
我的拉黑
我的评论
我的点赞