2025-04-15 18:14:09
Haha, deepseek r1 is using a modified BoN-RL replacing BoN with Group mean advantage was. And Kimi is taking the formulation of BoN it self. Amazing to see those model become life
2025-04-15 18:14:09
2025-04-15 11:32:53
2025-04-15 08:20:29
2025-04-14 21:14:45