Herman Jin
1周前
Herman Jin
1周前
白板报 Whiteboard
1周前
🇺🇸为自由而战-天山剑客🇺🇸
1周前
初码
1周前
白板报 Whiteboard
1周前
Financelot
1周前
宝玉
1周前
Jane Manchun Wong
1周前
中国人权-Human Rights in China
1周前
New Chinese AI DeepSeek rivals the platforms of OpenAI, Google and Anthropic, at a significantly lower cost. Yet, censorship and misinformation are built into the model, limiting its usefulness and raising free speech concerns.
Prakash (Ate-a-Pi)
1周前
Deepseek is not a “side project”. At the same time employees are not lying when they say it is. The story they are telling is myth making in the same vein in the Silicon Valley “we want to make the world a better place” but at the same time make billions of dollars. The team obviously had - access to more than ~10k GPUs - according to Scale AI CEO ~50k - was hiring only from top 3 universities from China meaning competitive with Alibaba and Tencent These two facts alone mean that they were clearly commercially successful and well known enough to get access to both those resources Deepseek feels more to me like skunkworks, perhaps a necessary one as the core quant business became less feasible regulatorily. It’s like Lockheed setting up a separate small team to compete with SpaceX because the main United Launch Alliance was not going to work out. It’s also very hard to track costs in China, because the regional governments absorb so much costs. - Early Bitcoin miners had free power because governments built power plants to nowhere and miners were willing to site next to them - Alibaba was able to get regional govt to absorb warehouse construction costs on their balance sheets rather than directly pay for it, and looked extremely asset light and softwarey when it went public Perfectly possible for most of the costs to be parked on a balance sheet outside the core business, perhaps as some form of tech data center construction incentive. Also possible no one except the founder knows all the financial arrangements. Some of these can be absolutely insane handshake deals which get resolved by reputation alone so 🤷♀️ This much is clear: - the model is really really good, on par with OpenAI release from 2 months ago - having said that unreleased models from OpenAI and Anthropic are (probably) better - the research agenda is still being set by the US firms, this model was a fast follow on the o1 release - they are working very fast as they are catching up sooner than expected - they are not copying or cheating, this isn’t industrial espionage. At most it is reverse engineering - they are largely developing their own talent, not reliant on US trained PhD’s - they are less constrained than the American firms by IP licensing, privacy, safety, political concerns around wrongly ingesting data from people who don’t want to be trained on. Fewer lawsuits, lawyers and less caution - they also seem to be over the “Tiananmen Square” issue. The model can say it, even if the Deepseek website doesn’t Of these the most significant upgrade is that they are able to develop talent internally without relying on US trained PhDs. That expands the pool significantly. What happens next ?
Chubby♨️
1周前
Billionaire and Scale AI CEO Alexandr Wang: DeepSeek has about 50,000 NVIDIA H100s that they can't talk about because of the US export controls that are in place.
qinbafrank
1周前
阑夕
1周前
pourteaux
1周前
Jacobson🌎🌸贴贴BOT
1周前
ben
1周前
高级分析师
1周前
向阳乔木
1周前
石井大智
1周前
阑夕
2周前
Kimi和DeepSeek的新模型这几天内同时发布,又是一波让人看不懂的突飞猛进,硅谷的反应也很有意思, 已经不再是惊讶「他们是怎么办到的」,而是变成了「他们是怎么能这么快的」,就快走完了质疑、理解、成为的三段论。 先说背景。大模型在运作上可用粗略分为训练和推理两大部分,在去年9月之前,训练的质量一直被视为重中之重,也就是通过所谓的算力堆叠,搭建万卡集群甚至十万卡集群来让大模型充分学习人类语料,去解决智能的进化。 为什么去年9月是个关键的转折点呢?因为OpenAI发布了GPT-o1,以思维链(Chain-of-Thought)的方式大幅提高了模型能力。 在那之前,行业里其实都在等GPT-5,以为一年以来传得沸沸扬扬的Q*就是GPT-5,对o1这条路线的准备严重不足,但这也不是说o1不能打,它的强大是在另一个层面,如果说训练能让AI变得更聪明,那么推理就会让AI变得更有用。 从o1到o3,OpenAI的方向都很明确,就是变着法儿奔向AGI,一招不行就换另一招,永远都有对策,大家平时对于OpenAI的调侃和批评很多,但那都是建立在高预期的前提下,真不要以为OpenAI没后劲了,事实上每次都还是它在推动最前沿的技术创新,踩出一条小径后别人才敢放心大胆的跟上去。 AI大厂们一直不太承认训练撞墙的问题,这涉及到扩展法则(Scaling Law)——只要有更多的数据和算力,大模型就能持续进步——有没有失效的问题,因为可被训练的全网数据早就被抓取殆尽了,没有新的知识增量,大模型的智能也就面临着无源之水的困局。 于是从训练到推理的重点转移,成了差不多半年以来最新的行业共识,推理采用的技术是强化学习(RL),让模型学会评估自己的预测并持续改进,这不是新东西,AlphaGo和GPT-4都是强化学习的受益者,但o1的思维链又把强化学习的效果往前推进了一大步,实现了用推理时间换推理质量的正比飞跃。 给AI越充分的思考时间,AI就能越缜密的输出答案,是不是有点像新的扩展法则?只不过这个扩展在于推理而非训练阶段。 理解了上述背景,才能理解Kimi和DeepSeek在做的事情有什么价值。 DeepSeek一直是「扮猪吃老虎」的角色,不但是价格战的发起者,600万美元训练出GPT-4o级模型的测试结果,更是让它一战成名,而Kimi正好相反,它的产品能力很强,有用户,甚至还为行业贡献了足够的融资八卦,但在科研方面,除了都知道杨植麟是个牛逼的人之外,其实还是不太被看到。 这次就不一样了,DeepSeek不再是一枝独秀,Kimi也把肌肉秀到了人家脸上,Kimi k1.5满血版在6项主流基准测试里和o1同台竞赛,拿到了3胜1平2负的结果,已经完全称得上是平起平坐了。(1/2)
goldengrape https://quail.ink/goldengrape
2周前
新浪新闻-新浪新闻综合
2周前
Wang Shuyi
2周前
九原客
2周前
初码
2周前