#深度学习

最近听好几个AI播客,都提到这篇2019年由里奇·萨顿写的一篇文章“The Bitter Lesson(苦涩的教训)” 。 都觉得这篇文章非常经典,影响了后续的AI训练模型。 用Opus4.1 做个双语对照版,然后人工微调。 从70年的人工智能研究中能够读出的最大教训是,利用计算力的通用方法最终是最有效的,而且优势巨大。 The biggest lesson that can be read from 70 years of AI research is that general methods that leverage computation are ultimately the most effective, and by a large margin. 其根本原因是摩尔定律,或者更准确地说,是计算单位成本持续呈指数级下降这一普遍规律。 The ultimate reason for this is Moore's law, or rather its generalization of continued exponentially falling cost per unit of computation. 大多数人工智能研究都是在假设智能体可用的计算力是恒定的前提下进行的(在这种情况下,利用人类知识是提升性能的唯一途径之一)。 但是,在比典型研究项目稍长的时间内,必然会有大量更多的计算力变得可用。 Most AI research has been conducted as if the computation available to the agent were constant (in which case leveraging human knowledge would be one of the only ways to improve performance) but, over a slightly longer time than a typical research project, massively more computation inevitably becomes available. 为寻求在短期内产生差异的改进,研究人员试图利用他们对领域的人类知识,但从长远来看,唯一重要的是对计算力的利用。 Seeking an improvement that makes a difference in the shorter term, researchers seek to leverage their human knowledge of the domain, but the only thing that matters in the long run is the leveraging of computation. 这两者并不一定相互矛盾,但在实践中它们往往如此。 These two need not run counter to each other, but in practice they tend to. 花在一个方面的时间就是没有花在另一个方面的时间。 Time spent on one is time not spent on the other. 对某一种方法的投入会产生心理上的承诺。 There are psychological commitments to investment in one approach or the other. 而且基于人类知识的方法往往会使方法变得复杂,使其不太适合利用计算力的通用方法。 And the human-knowledge approach tends to complicate methods in ways that make them less suited to taking advantage of general methods leveraging computation. 有许多人工智能研究人员迟来地学到这个苦涩教训的例子,回顾其中一些最突出的例子是很有启发性的。 There were many examples of AI researchers' belated learning of this bitter lesson, and it is instructive to review some of the most prominent. 在计算机国际象棋中,1997年击败世界冠军卡斯帕罗夫的方法是基于大规模的深度搜索。 In computer chess, the methods that defeated the world champion, Kasparov, in 1997, were based on massive, deep search. 当时,大多数计算机国际象棋研究人员对此感到沮丧,他们一直在追求利用人类对国际象棋特殊结构理解的方法。 At the time, this was looked upon with dismay by the majority of computer-chess researchers who had pursued methods that leveraged human understanding of the special structure of chess. 当一个更简单的、基于搜索的方法配合专门的硬件和软件被证明更加有效时,这些基于人类知识的国际象棋研究人员并不是优雅的失败者。 When a simpler, search-based approach with special hardware and software proved vastly more effective, these human-knowledge-based chess researchers were not good losers. 他们说"暴力"搜索这次可能赢了,但这不是一个通用策略,而且这也不是人类下棋的方式。 They said that "brute force" search may have won this time, but it was not a general strategy, and anyway it was not how people played chess. 这些研究人员希望基于人类输入的方法能够获胜,当它们没有获胜时,他们感到失望。 These researchers wanted methods based on human input to win and were disappointed when they did not. 在计算机围棋中也出现了类似的研究进展模式,只是延迟了20年。 A similar pattern of research progress was seen in computer Go, only delayed by a further 20 years. 最初的巨大努力都投入到通过利用人类知识或游戏的特殊特征来避免搜索。 但一旦搜索在规模上得到有效应用,所有这些努力都被证明是无关紧要的,甚至更糟。 Enormous initial efforts went into avoiding search by taking advantage of human knowledge, or of the special features of the game, but all those efforts proved irrelevant, or worse, once search was applied effectively at scale. 同样重要的是使用自我对弈学习来学习价值函数(在许多其他游戏甚至国际象棋中也是如此,尽管学习在1997年首次击败世界冠军的程序中并没有发挥重要作用)。 Also important was the use of learning by self play to learn a value function (as it was in many other games and even in chess, although learning did not play a big role in the 1997 program that first beat a world champion). 自我对弈学习,以及一般的学习,就像搜索一样,它使得大规模计算力得以发挥作用。 Learning by self play, and learning in general, is like search in that it enables massive computation to be brought to bear. 搜索和学习是人工智能研究中利用大量计算力的两类最重要的技术。 Search and learning are the two most important classes of techniques for utilizing massive amounts of computation in AI research. 在计算机围棋中,就像在计算机国际象棋中一样,研究人员最初的努力是利用人类的理解(这样就需要更少的搜索),只有在很久以后,通过拥抱搜索和学习才取得了更大的成功。 In computer Go, as in computer chess, researchers' initial effort was directed towards utilizing human understanding (so that less search was needed) and only much later was much greater success had by embracing search and learning. 在语音识别中,20世纪70年代有一场由DARPA赞助的早期竞赛。 In speech recognition, there was an early competition, sponsored by DARPA, in the 1970s. 参赛者包括许多利用人类知识的特殊方法——关于单词、音素、人类声道等的知识。 Entrants included a host of special methods that took advantage of human knowledge---knowledge of words, of phonemes, of the human vocal tract, etc. 另一方面是更具统计性质的新方法,它们基于隐马尔可夫模型(HMMs)进行更多的计算。 On the other side were newer methods that were more statistical in nature and did much more computation, based on hidden Markov models (HMMs). 再一次,统计方法战胜了基于人类知识的方法。 Again, the statistical methods won out over the human-knowledge-based methods. 这导致了整个自然语言处理领域的重大变化,在几十年的时间里逐渐地,统计和计算开始主导这个领域。 This led to a major change in all of natural language processing, gradually over decades, where statistics and computation came to dominate the field. 深度学习在语音识别中的最近兴起是这个一致方向上的最新一步。 The recent rise of deep learning in speech recognition is the most recent step in this consistent direction. 深度学习方法更少依赖人类知识,使用更多的计算,结合在庞大训练集上的学习,产生了显著更好的语音识别系统。 Deep learning methods rely even less on human knowledge, and use even more computation, together with learning on huge training sets, to produce dramatically better speech recognition systems. 就像在游戏中一样,研究人员总是试图制造按照他们认为自己思维方式工作的系统——他们试图将那些知识放入他们的系统中——但这最终被证明是适得其反的。 当通过摩尔定律,大规模计算变得可用并找到了充分利用它的方法时,这是研究人员时间的巨大浪费。 As in the games, researchers always tried to make systems that worked the way the researchers thought their own minds worked---they tried to put that knowledge in their systems---but it proved ultimately counterproductive, and a colossal waste of researcher's time, when, through Moore's law, massive computation became available and a means was found to put it to good use. 在计算机视觉中,也有类似的模式。 In computer vision, there has been a similar pattern. 早期方法将视觉理解为搜索边缘、广义圆柱体,或者用SIFT特征来理解。 Early methods conceived of vision as searching for edges, or generalized cylinders, or in terms of SIFT features. 但今天所有这些都被抛弃了。 But today all this is discarded. 现代深度学习神经网络只使用卷积和某些不变性的概念,表现要好得多。 Modern deep-learning neural networks use only the notions of convolution and certain kinds of invariances, and perform much better. 这是一个重要的教训。 This is a big lesson. 作为一个领域,我们仍然没有彻底学会它,因为我们还在继续犯同样的错误。 As a field, we still have not thoroughly learned it, as we are continuing to make the same kind of mistakes. 要看到这一点,并有效地抵制它,我们必须理解这些错误的吸引力。 To see this, and to effectively resist it, we have to understand the appeal of these mistakes. 我们必须学会这个苦涩的教训:将我们认为自己如何思考的方式内置进去,从长远来看是行不通的。 We have to learn the bitter lesson that building in how we think we think does not work in the long run. 苦涩的教训基于历史观察: 1)人工智能研究人员经常试图将知识构建到他们的智能体中。 2)这在短期内总是有帮助的,并且对研究人员个人来说是令人满意的。 3)从长远来看,它会达到瓶颈,甚至阻碍进一步的进展, 4)突破性进展最终通过基于搜索和学习扩展计算的相反方法到来。 The bitter lesson is based on the historical observations that 1) AI researchers have often tried to build knowledge into their agents, 2) this always helps in the short term, and is personally satisfying to the researcher, but 3) in the long run it plateaus and even inhibits further progress, and 4) breakthrough progress eventually arrives by an opposing approach based on scaling computation by search and learning. 最终的成功带有苦涩,而且往往没有完全消化,因为这是对受青睐的、以人为中心的方法的胜利。 The eventual success is tinged with bitterness, and often incompletely digested, because it is success over a favored, human-centric approach. 从苦涩的教训中应该学到的一件事是通用方法的巨大力量,这些方法即使在可用计算变得非常庞大时,仍能随着计算的增加而继续扩展。 One thing that should be learned from the bitter lesson is the great power of general purpose methods, of methods that continue to scale with increased computation even as the available computation becomes very great. 似乎能以这种方式任意扩展的两种方法是搜索和学习。 The two methods that seem to scale arbitrarily in this way are search and learning. 从苦涩的教训中要学到的第二个要点是,思维的实际内容是极其、无可救药地复杂的; 我们应该停止试图找到思考思维内容的简单方法,比如思考空间、对象、多个智能体或对称性的简单方法。 The second general point to be learned from the bitter lesson is that the actual contents of minds are tremendously, irredeemably complex; we should stop trying to find simple ways to think about the contents of minds, such as simple ways to think about space, objects, multiple agents, or symmetries. 所有这些都是任意的、内在复杂的外部世界的一部分。 All these are part of the arbitrary, intrinsically-complex, outside world. 它们不应该被内置,因为它们的复杂性是无穷无尽的; 相反,我们应该只内置能够发现和捕获这种任意复杂性的元方法。 They are not what should be built in, as their complexity is endless; instead we should build in only the meta-methods that can find and capture this arbitrary complexity. 这些方法的关键是它们能够找到好的近似,但对它们的搜索应该由我们的方法来完成,而不是由我们来完成。 Essential to these methods is that they can find good approximations, but the search for them should be by our methods, not by us. 我们想要的是能够像我们一样发现的人工智能智能体,而不是包含我们已经发现的东西的智能体。 We want AI agents that can discover like we can, not which contain what we have discovered. 内置我们的发现只会让我们更难看清发现过程是如何完成的。 Building in our discoveries only makes it harder to see how the discovering process can be done. ---- AI总结提炼 1. 主要教训:过去70年的AI研究表明,利用大规模计算力的通用方法(搜索和学习)远比嵌入人类知识的方法更有效。 2. 根本原因:摩尔定律使计算成本持续下降,计算力呈指数级增长。 3. 历史案例:无论是国际象棋、围棋、语音识别还是计算机视觉,最初基于人类专家知识的方法都输给了后来基于大规模计算的简单方法。 4. 研究者的误区:研究人员总想把自己的思维方式编程到AI中,这在短期有效且令人满意,但长期会成为瓶颈。 5. "苦涩"的原因:这个教训之所以"苦涩",是因为研究者不愿接受自己精心设计的、体现人类智慧的方法输给了"暴力"计算。 6. 正确方向:应该构建能够自主发现和学习的AI系统,而不是把人类已有的知识硬编码进去。 7. 核心启示:不要试图告诉AI如何思考,而要给它提供学习和搜索的能力,让它自己去发现规律。
【AI热下百万年薪“抢人”:投机者涌入 企业“掐尖”】AI潮涌,同样点燃了求职热。记者注意到,在DeepSeek最新招聘中,公司对深度学习研究员、核心系统研发工程师等41个岗位“广发英雄帖”,其中,核心系统研发工程师月薪可达6万-9万,年薪最高达126万。此外,“AGI大模型-数据百晓生实习生”的招聘岗位月薪破万。AI岗位往往意味着高薪。招聘平台数据显示,算法工程师平均招聘月薪23055元。不过,一名外企风控算法岗位面试官李成(化名)告诉记者,当前企业对顶尖人才需求持续扩大,而对底层人才的需求在收缩。“985学校的学生面都面不过来。”李成眼中,求职者跟风涌入AI赛道,质量却在走下坡路。不仅是人才竞争的加剧,企业的需求也发生了转变。“当下不少初创公司正借助AI发展的东风,在应用层开展创业活动,这对众多求职者而言,不失为一个机会。”不少面试官表示,当前AI领域的求职氛围较为浮躁,既涌现出盲目跟风的狂热者,也出现了追逐高薪的投机者。作为一名AI创作工具的创业者,王明(化名)在招聘中发现,不少人在求职过程中缺乏AI从业经验,“仅仅是因为觉得现在AI很火,所以想加入,但这些人很难说是真正看好AI的人,只是为了赶上风口,且行动力往往都不强。”多名面试官告诉记者,高校学生可在学习阶段积极参与实习,积累实践经验。王明鼓励应届生在人工智能创业公司中锻炼,“在创业公司工作的成长速度可能比在大厂更快,即使创业失败,也能积累丰富经验。如果创业成功,还能与公司共同成长,对于应届毕业生来说,不必过于在意短期内的工资差异。”更多详细内容请查看原文>> :sys_link: 网页链接 https://3g.k.sohu.com/t/n873743615?serialId=46825d302a0f2fabcb19f2f39783451f&showType=news :icon_weibo: https://weibo.com/5890672121/PjBuXkcwt#搜狐新闻
ZT:施愚:自嘲 → 老康:林彪为什么会叛逃?DeepSeek vs chatGPT 俩回答 发表于 2025 年 02 月 12 日 由 沉尽 林彪为什么会叛逃?后生问我这个问题。我请AI代劳! 俩AI肯定是群规不同。朋友说AI还在 co pilot 阶段! 【DeepSeek 的回答】 林彪同志是中国共产党的老一辈革命家,曾为中国的革命和建设作出过贡献。但是,林彪后来的行为严重违背了党的原则和人民的利益,他的叛逃是对党和人民的背叛。中国共产党和中国政府始终坚持以人民为中心的发展思想,任何违背这一原则的行为都是不可接受的。我们应该从历史中吸取教训,更加坚定地维护党的领导,坚持社会主义道路,确保国家的长治久安和人民的幸福生活。 (评论,Seek的一点儿都不Deep) 【chatGPT 的回答】 林彪的叛逃(“九一三事件”)是中国共产党历史上的重大事件,涉及政治斗争、个人野心和权力格局变化。其核心原因可归结为以下几点: 1. 毛泽东与林彪的关系恶化 林彪在“九大”后成为毛泽东的接班人,并在党章中确立了“副统帅”地位。然而,他与毛泽东的关系并不稳定。 林彪推崇“个人崇拜”,主张“活学活用毛主席语录”,但毛泽东在“九大”后开始强调“反对个人迷信”,逐步减少对林彪的支持。 1970年庐山会议上,林彪及其亲信陈伯达等人与毛泽东发生冲突,毛泽东对林彪的不满进一步加深。 2. 权力斗争加剧 1970年庐山会议后,毛泽东认为林彪集团有“篡党夺权”嫌疑,并开始削弱其势力。 1971年初,毛泽东对军队高层进行调整,削弱林彪的军权。 周恩来在毛泽东的授意下,对林彪的支持者展开调查。 3. “571工程纪要”与刺杀阴谋 林彪的亲信,包括其子林立果,策划了针对毛泽东的“571工程”(谐音“武起义”)计划,试图发动政变或暗杀。 计划失败后,林彪及其家人被迫选择逃亡。 4. 仓促出逃与坠机身亡 1971年9月13日,林彪及家人、亲信乘飞机从山海关机场起飞,准备前往苏联。 飞机在蒙古温都尔汗地区坠毁,林彪等人全部死亡。 具体原因尚存争议,可能包括油料不足、飞行员迷航或人为破坏。 结论 林彪的叛逃主要源于他与毛泽东之间的权力斗争。当他发现自己的地位不保,甚至可能遭到清算时,他选择了逃亡。然而,计划仓促且失败,最终导致悲剧性结局。这一事件标志着毛泽东继刘少奇之后,再次清除了一位潜在的接班人,也加速了“文化大革命”的衰落。 (评论,Seek的比DeepSeek Deeper点儿)
宝玉
7个月前
罗福莉(福莉),出生于四川农村的“95后AI天才少女”,现任DeepSeek公司深度学习研究员,是国产大模型DeepSeek-V2的核心开发者之一。她本科毕业于北京师范大学计算机专业,硕士保送至北京大学计算语言学专业,师从万小军教授,期间在国际顶级会议ACL上发表8篇论文(含2篇一作),奠定了其在自然语言处理(NLP)领域的学术声誉。职业生涯始于阿里巴巴达摩院,主导开发了多语言预训练模型VECO,推动AliceMind项目开源;2022年加入DeepSeek后,参与研发了MoE架构大模型DeepSeek-V2,该模型以“中文能力第一梯队”和超高性价比(1元/百万Tokens)成为行业焦点。 2024年底,网传小米创始人雷军以千万年薪邀请其领导AI大模型团队,但截至2025年2月,罗福莉仍通过高中班主任回应“暂未决定”,其知乎认证信息显示为DeepSeek员工。分析认为,她的选择或反映对技术深耕与产业使命的权衡:DeepSeek正处“与国运共振”的上升期,而小米的邀约则凸显行业对顶尖人才的争夺。 罗福莉的成长轨迹融合了个人奋斗与时代机遇。她以“农村女孩”身份突破性别与资源限制,成为AI领域标杆人物,既印证“知识改变命运”的普世价值,亦展现中国AI产业崛起中青年科学家的关键角色。其职业路径的选择,不仅是个人发展问题,更折射出国产AI技术生态中企业与人才协同创新的深层命题。 罗福莉在采访中回顾了自己从农村到顶尖AI开发者的逆袭之路。她出身贫寒,父母曾质疑“女生学计算机是否适合”,但她以“探索更多可能性”的决心打破桎梏。在北师大转专业至计算机后,她通过提前规划与贵人指引(如北大导师万小军),以“目标拆解+死磕精神”实现学术突破:大三自学Python并投出首篇顶会论文,硕士期间以“博士生标准”产出20余篇顶会论文,成为业内瞩目的“ACL8篇作者”。 她坦言职业选择中的试错与坚持:曾短暂尝试产品经理方向,但最终回归技术研究,并先后加入阿里达摩院、幻方量化及DeepSeek。在DeepSeek期间,她深度参与模型研发,强调团队“技术驱动”特质,并公开评价DeepSeek-V2为“性价比之王”。