Grok 4看起来如此美好,甚至Elon说毫无例外在每个学科上表现都可能高于博士水平,但我还没想好要不要订阅? 你们呢? 发布会汇总 — Post-training RL spend == pretraining spend — $3/M input tokens, $15/M output tokens, 256k context, price 2x beyond 128k — #1 on Humanity’s Last Exam (general hard problems) 44.4%, #2 is 26.9% — #1 on GPQA (hard graduate problems) 88.9%. #2 is 86.4% — #1 on AIME 2025 (Math) 100%, #2 is 98.4% — #1 on Harvard MIT Math 96.7%, #2 is 82.5% — #1 on USAMO25 (Math) 61.9%, #2 is 49.4% — #1 on ARC-AGI-2 (easy for humans, hard for AI) 15.9%, #2 is 8.6% — #1 on LiveCodeBench (Jan-May) 79.4%, #2 is 75.8%