I’m not saying you should, but I’m also not saying you shouldn’t
Herrington Darkholme
6个月前
rule based reward model also means their training target would be limited to domains with ground truth. It is interesting how they can extend to questions with ambiguous, but comparable, answers
non aesthetic things
7个月前
Wife Her Up!
The Figen
7个月前
It's all about perspective.
NO CONTEXT HUMANS
7个月前
That's the circle of life
NO CONTEXT HUMANS
7个月前
Is the violence really necessary?