马东锡 NLP 🇸🇪

马东锡 NLP 🇸🇪

0 关注者

5个月前

「RLVR, Reasoning」 Spurious Rewards: Rethinking Training Signals in RLVR 当随意的奖励信号仍可以大幅提升模型性能,就得重新思考:到底是RL在学习,还是在放大某种“先验”行为。 "RLVR must somehow be surfacing useful reasoning representations learned d

#RLVR #SpuriousRewards #DeepLearning #reasoning #TrainingSignals #MachineLearning #ModelPerformance

相关新闻

placeholder

𝙩𝙮≃𝙛{𝕩}^A𝕀²·ℙarad𝕚g𝕞

1周前

reasoning的reasoning — reasoning pattern 越来越多的研究关注LLM的元能力二阶能力了

placeholder

𝙩𝙮≃𝙛{𝕩}^A𝕀²·ℙarad𝕚g𝕞

2周前

reasoning as a core capability 约等于 cognitive core? jakub: we're focusing less on version numbers now. GPT-5 introduces reasoning as a core capability, and we're decoupling product releases from resea

placeholder

Tom Huang

5个月前

最权威的 MCP 课程来了💥 国家队下场教你构建富上下文的 AI 应用⚡️ Anthropic 与 吴恩达的 DeepLearning 正式合作课程发布! 学习如何使用 MCP,整合各种数据源如 Google Drive,Notion 等综合回答问题

placeholder

马东锡 NLP 🇸🇪

6个月前

「DeepSeek, Reasoning」论文 DeepSeek-Prover-V2: Advancing Formal Mathematical Reasoning via Reinforcement Learning for Subgoal Decomposition 用"sorry"做占位符,sorry,除了硬核,无法可说。 DeepSeek这篇在reasoning的追求上,到了一个让

placeholder

马东锡 NLP 🇸🇪

6个月前

「Agent, RAG, Reasoning」论文 ReSearch: Learning to Reason with Search for LLMs via Reinforcement Learning ReSearch,充满了 ReAct 的影子。它教会模型“何时求助于世界”;但局限在于,ReSearch 只能依赖一种工具。 作者提出了一种创新的框架,名为 ReSearch,旨在

© 2025 news.news. All rights reserved. 0.0917 秒. v1.0.46
我的评论