「RLVR, Reasoning」 Spurious Rewards: Rethinking Training Signals in RLVR 当随意的奖励信号仍可以大幅提升模型性能，就得重新思考：到底是RL在学习，还是在放大某种“先验”行为。 "RLVR must somehow be surfacing useful reasoning representations learned d

#RLVR #SpuriousRewards #DeepLearning #reasoning #TrainingSignals #MachineLearning #ModelPerformance

相关新闻

𝙩𝙮≃𝙛{𝕩}^A𝕀²·ℙarad𝕚g𝕞

1周前

reasoning的reasoning — reasoning pattern 越来越多的研究关注LLM的元能力二阶能力了

𝙩𝙮≃𝙛{𝕩}^A𝕀²·ℙarad𝕚g𝕞

2周前

reasoning as a core capability 约等于 cognitive core？ jakub: we're focusing less on version numbers now. GPT-5 introduces reasoning as a core capability, and we're decoupling product releases from resea

Tom Huang

5个月前

最权威的 MCP 课程来了💥 国家队下场教你构建富上下文的 AI 应用⚡️ Anthropic 与吴恩达的 DeepLearning 正式合作课程发布！学习如何使用 MCP，整合各种数据源如 Google Drive，Notion 等综合回答问题

马东锡 NLP 🇸🇪

6个月前

「DeepSeek, Reasoning」论文 DeepSeek-Prover-V2: Advancing Formal Mathematical Reasoning via Reinforcement Learning for Subgoal Decomposition 用"sorry"做占位符，sorry，除了硬核，无法可说。 DeepSeek这篇在reasoning的追求上，到了一个让

马东锡 NLP 🇸🇪

6个月前

「Agent, RAG, Reasoning」论文 ReSearch: Learning to Reason with Search for LLMs via Reinforcement Learning ReSearch，充满了 ReAct 的影子。它教会模型“何时求助于世界”；但局限在于，ReSearch 只能依赖一种工具。作者提出了一种创新的框架，名为 ReSearch，旨在