2025-05-31 11:18:44
Haha, deepseek r1 is using a modified BoN-RL replacing BoN with Group mean advantage was. And Kimi is taking the formulation of BoN it self. Amazing to see those model become life
2025-05-31 11:18:44
2025-05-31 07:10:00
2025-05-30 18:25:33
2025-05-30 14:12:22
2025-05-30 12:04:07