rule based reward model also means their training target would be limited to domains with ground truth. It is interesting how they can extend to questions with ambiguous, but comparable, answers
rule based reward model also means their training target would be limited to domains with ground truth. It is interesting how they can extend to questions with ambiguous, but comparable, answers
NO CONTEXT HUMANS
7个月前
I’m not saying you should, but I’m also not saying you shouldn’t
NO CONTEXT HUMANS
7个月前
Me too machine, me too.
NO CONTEXT HUMANS
7个月前
AI is wild