Episode Summary
From Weights & Biases - the craziest week of AI, R1 beats O1 but MIT license, $500B investment into AI with SoftBank, OpenAI Operator Agents, White House AI Executive Order & more AI news This episode covers Major AI Investments and Updates, ByteDance's UiTars and Other Open Source News, Open Source AI: DeepSeek R1, Introducing Operator: AI Agents in Action, and Humanity's Last Exam Benchmark.
In This Episode
Hosts & Guests
By The Numbers
๐ฅ Breaking During The Show
๐ฐ Major AI Investments and Updates
**Alex Volkov:** Like, as a small, tiny announcement of half a trillion dollars investment upcoming in AI from, from OpenAI and, Masayoshi san from,from Vision Fund. and, Larry Ellison from Oracle.
- **Alex Volkov:** Like, as a small, tiny announcement of half a trillion dollars investment upcoming in AI from, from OpenAI and, Masayoshi san from,from Vision Fund.
- and, Larry Ellison from Oracle.
๐ ByteDance's UiTars and Other Open Source News
**Alex Volkov:** Also in the open source LLMs, kind of LLMs, ByteDance dropped UiTars. UiTars is, ByteDance's computer use model that they claim, 7 billion parameters and 72 billion parameters, controls your Mac or PC and they have an app for both and they beat GPD 4.
- **Alex Volkov:** Also in the open source LLMs, kind of LLMs, ByteDance dropped UiTars.
- UiTars is, ByteDance's computer use model that they claim, 7 billion parameters and 72 billion parameters, controls your Mac or PC and they have an app for both and they beat GPD 4.
๐ Open Source AI: DeepSeek R1
**Alex Volkov:** All right, folks. Open source AI has never been as hot as this week.
- **Alex Volkov:** All right, folks.
- Open source AI has never been as hot as this week.
๐ ๏ธ Introducing Operator: AI Agents in Action
**Sam Altman:** AI agents are AI systems that can do work for you. You give them a task and they go off and do it.
- **Sam Altman:** AI agents are AI systems that can do work for you.
- You give them a task and they go off and do it.
๐ Humanity's Last Exam Benchmark
**Alex Volkov:** So shout out We're not gonna use the breaking news button because it's gonna happen before the show, but it's okay It's called humanity's last exam and this is a very unsaturated benchmark as you guys know We talk about benchmarks all the time MMU math all those things and they are always always They're getting close to saturated like like math is at 98%saturated, I believe at 99 percent MMLU is saturated. and we talked about, frontier, frontier math, which is an attempt to, have a very, very hard math problems.
- **Alex Volkov:** So shout out We're not gonna use the breaking news button because it's gonna happen before the show, but it's okay It's called humanity's last exam and this is a very unsaturated benchmark as you guys know We talk about benchmarks all the time MMU math all those things and they are always always They're getting close to saturated like like math is at 98%saturated, I believe at 99 percent MMLU is saturated.
- and we talked about, frontier, frontier math, which is an attempt to, have a very, very hard math problems.
Open Source LLMs
Big CO LLMs + APIs
SoftBank, Oracle, OpenAI Stargate Project - $500B AI infrastructure (OpenAI Blog)
Google Gemini Flash Thinking 01-21 - 1M context, Code execution, Better Evals (X)
OpenAI Operator - Agentic browser in ChatGPT Pro operator.chatgpt.com
Anthropic launches citations in API (blog)
Perplexity SonarPRO Search API and an Android AI assistant (X)
This weeks Buzz ๐
W&B broke SOTA SWE-bench verified (W&B Blog)
Vision & Video
HuggingFace SmolVLM - Tiny VLMs - runs even on WebGPU (HF)
AI Art & Diffusion & 3D
Hunyuan 3D 2.0 - SOTA open-source 3D (HF)
Tools
ByteDance Trae - Cursor competitor (Trae AI: https://trae.ai/)
Show Notes: