Episode Summary

From Weights & Biases - the craziest week of AI, R1 beats O1 but MIT license, $500B investment into AI with SoftBank, OpenAI Operator Agents, White House AI Executive Order & more AI news This episode covers Major AI Investments and Updates, ByteDance's UiTars and Other Open Source News, Open Source AI: DeepSeek R1, Introducing Operator: AI Agents in Action, and Humanity's Last Exam Benchmark.

By The Numbers

Episode Length
110 min
Runtime captured from the cached podcast RSS metadata.
Show-note Links
19
Curated links preserved from the cached Substack post.
Featured Speakers
5
Known host, co-hosts, and guests surfaced on the episode page.
Chapter Highlights
5
Major sections summarized from the exported Descript markdown.

๐Ÿ”ฅ Breaking During The Show

Major AI Investments and Updates
**Alex Volkov:** Like, as a small, tiny announcement of half a trillion dollars investment upcoming in AI from, from OpenAI and, Masayoshi san from,from Vision Fund. and, Larry Ellison from Oracle.
ByteDance's UiTars and Other Open Source News
**Alex Volkov:** Also in the open source LLMs, kind of LLMs, ByteDance dropped UiTars. UiTars is, ByteDance's computer use model that they claim, 7 billion parameters and 72 billion parameters, controls your Mac or PC and they have an app for both and they beat GPD 4.
Open Source AI: DeepSeek R1
**Alex Volkov:** All right, folks. Open source AI has never been as hot as this week.

๐Ÿ“ฐ Major AI Investments and Updates

**Alex Volkov:** Like, as a small, tiny announcement of half a trillion dollars investment upcoming in AI from, from OpenAI and, Masayoshi san from,from Vision Fund. and, Larry Ellison from Oracle.

  • **Alex Volkov:** Like, as a small, tiny announcement of half a trillion dollars investment upcoming in AI from, from OpenAI and, Masayoshi san from,from Vision Fund.
  • and, Larry Ellison from Oracle.

๐Ÿ”“ ByteDance's UiTars and Other Open Source News

**Alex Volkov:** Also in the open source LLMs, kind of LLMs, ByteDance dropped UiTars. UiTars is, ByteDance's computer use model that they claim, 7 billion parameters and 72 billion parameters, controls your Mac or PC and they have an app for both and they beat GPD 4.

  • **Alex Volkov:** Also in the open source LLMs, kind of LLMs, ByteDance dropped UiTars.
  • UiTars is, ByteDance's computer use model that they claim, 7 billion parameters and 72 billion parameters, controls your Mac or PC and they have an app for both and they beat GPD 4.

๐Ÿ”“ Open Source AI: DeepSeek R1

**Alex Volkov:** All right, folks. Open source AI has never been as hot as this week.

  • **Alex Volkov:** All right, folks.
  • Open source AI has never been as hot as this week.

๐Ÿ› ๏ธ Introducing Operator: AI Agents in Action

**Sam Altman:** AI agents are AI systems that can do work for you. You give them a task and they go off and do it.

  • **Sam Altman:** AI agents are AI systems that can do work for you.
  • You give them a task and they go off and do it.

๐Ÿ“Š Humanity's Last Exam Benchmark

**Alex Volkov:** So shout out We're not gonna use the breaking news button because it's gonna happen before the show, but it's okay It's called humanity's last exam and this is a very unsaturated benchmark as you guys know We talk about benchmarks all the time MMU math all those things and they are always always They're getting close to saturated like like math is at 98%saturated, I believe at 99 percent MMLU is saturated. and we talked about, frontier, frontier math, which is an attempt to, have a very, very hard math problems.

  • **Alex Volkov:** So shout out We're not gonna use the breaking news button because it's gonna happen before the show, but it's okay It's called humanity's last exam and this is a very unsaturated benchmark as you guys know We talk about benchmarks all the time MMU math all those things and they are always always They're getting close to saturated like like math is at 98%saturated, I believe at 99 percent MMLU is saturated.
  • and we talked about, frontier, frontier math, which is an attempt to, have a very, very hard math problems.
TL;DR and show notes
  • Open Source LLMs

    • DeepSeek R1 - MIT licensed SOTA open source reasoning model (HF, X)

    • ByteDance UI-TARS - PC control models (HF, Github )

    • HLE - Humanity's Last Exam benchmark (Website)

  • Big CO LLMs + APIs

    • SoftBank, Oracle, OpenAI Stargate Project - $500B AI infrastructure (OpenAI Blog)

    • Google Gemini Flash Thinking 01-21 - 1M context, Code execution, Better Evals (X)

    • OpenAI Operator - Agentic browser in ChatGPT Pro operator.chatgpt.com

    • Anthropic launches citations in API (blog)

    • Perplexity SonarPRO Search API and an Android AI assistant (X)

  • This weeks Buzz ๐Ÿ

    • W&B broke SOTA SWE-bench verified (W&B Blog)

  • Vision & Video

    • HuggingFace SmolVLM - Tiny VLMs - runs even on WebGPU (HF)

  • AI Art & Diffusion & 3D

    • Hunyuan 3D 2.0 - SOTA open-source 3D (HF)

  • Tools

  • Show Notes:

    • Pietro Skirano RAT - Retrieval augmented generation (X)

    • Run DeepSeek with more โ€œthinkingโ€ script (Gist)