Herrington Darkholme 0 关注者 关注 8个月前 rule based reward model also means their training target would be limited to domains with ground truth. It is interesting how they can extend to questions with ambiguous, but comparable, answers #RuleBasedAI #RewardModel #MachineLearning #ambiguity #GroundTruth 前往原网页查看