Episode Summary
Alex here, celebrating an absolutely crazy (to me) milestone, of #100 episodes of ThursdAI ๐ 100 episodes in a year and a half (as I started publishing, 100 episodes that documented INCREDIBLE AI progress, we mention on the show today, we used to be excited by context windows jumping from 4K to 16K!
In This Episode
- ๐ Open Source AI & LLMs: Llama 4 Takes Center Stage (Amidst Some Drama)
- ๐ฐ The messy release - Big Oof from Big Zuck
- ๐ฐ Too big for its own good (and us?)
- ๐ฐ My Take
- ๐ค Together AI & Agentica (Berkley) finetuned DeepCoder-14B with reasoning ([X]( [Blog](
- ๐ฐ NVIDIA Nemotron ULTRA is finally here, 253B pruned Llama 3-405B ([HF](
- ๐ฅ Vision & Video: Kimi Drops Tiny But Mighty VLMs
Hosts & Guests
By The Numbers
๐ Open Source AI & LLMs: Llama 4 Takes Center Stage (Amidst Some Drama)
This was by far the biggest news of this last week, and it dropped... on a Saturday? (I was on the mountain โท๏ธ!
- This was by far the biggest news of this last week, and it dropped...
- Meta dropped the long awaited LLama-4 models, huge ones this time - Llama 4 Scout: 17B active parameters out of \~109B total (16 experts).
- - Llama 4 Maverick: 17B active parameters out of a whopping \~400B total (128 experts).
๐ฐ The messy release - Big Oof from Big Zuck
Not only did Meta release on a Saturday, messing up people's weekends, Meta apparently announced a high LM arena score, but the model they provided to LMArena was... not the model they released!? It caused LMArena to release the 2000 chats dataset, and truly, some examples are quite damning and show just how unreliable LMArena can be as vibe eval.
- Not only did Meta release on a Saturday, messing up people's weekends, Meta apparently announced a high LM arena score, but the model they provided to LMArena was...
- It caused LMArena to release the 2000 chats dataset, and truly, some examples are quite damning and show just how unreliable LMArena can be as vibe eval.
- We've chatted on the show that this may be due to some VLLM issues, and speculated about other potential reasons for this.
๐ฐ Too big for its own good (and us?)
One of the main criticism the OSS community had about these releases, is that for many of us, the reason for celebrating Open Source AI, is the ability to run models without network, privately on our own devices. Llama 3 was released in 8-70B distilled versions and that was incredible for us local AI enthusiasts!
- Llama 3 was released in 8-70B distilled versions and that was incredible for us local AI enthusiasts!
- Why didn't Meta release those sizes?
- Was it due to an inability to beat Qwen/DeepSeek enough?
๐ฐ My Take
Despite the absolutely chaotic rollout, this is still a monumental effort from Meta. They spent _millions_ on compute and salaries to give this to the community. Yes, no papers yet, the LM Arena thing was weird, and the inference wasn't ready.
- Despite the absolutely chaotic rollout, this is still a monumental effort from Meta.
- They spent _millions_ on compute and salaries to give this to the community.
- Yes, no papers yet, the LM Arena thing was weird, and the inference wasn't ready.
๐ค Together AI & Agentica (Berkley) finetuned DeepCoder-14B with reasoning ([X]( [Blog](
Amidst the Llama noise, we got another stellar open-source release! We were thrilled to have Michael Lou from Agentica/UC Berkeley join us to talk about DeepCoder-14B-Preview which beats DeepSeek R1 and even o3-mini on several coding benchmarks.
- Amidst the Llama noise, we got another stellar open-source release!
- We were thrilled to have Michael Lou from Agentica/UC Berkeley join us to talk about DeepCoder-14B-Preview which beats DeepSeek R1 and even o3-mini on several coding benchmarks.
- The stated purpose of the project is to democratize RL and they have open sourced the model ([HF]( the dataset ([HF]( the Weights & Biases [logs]( and even the [eval logs](
๐ฐ NVIDIA Nemotron ULTRA is finally here, 253B pruned Llama 3-405B ([HF](
While Llama 4 was wrapped in mystery, NVIDIA dropped their pruned and distilled finetune of the previous Llama chonker 405B model, turning at just about half the parameters. And they were able to include the LLama-4 benchmarks in their release, showing that the older Llama, finetuned can absolutely beat the new ones at AIME, GPQA and more.
- While Llama 4 was wrapped in mystery, NVIDIA dropped their pruned and distilled finetune of the previous Llama chonker 405B model, turning at just about half the parameters.
- And they were able to include the LLama-4 benchmarks in their release, showing that the older Llama, finetuned can absolutely beat the new ones at AIME, GPQA and more.
- Nemotron Ultra supports 128K context and fits on a single 8xH100 node for inference.
๐ฅ Vision & Video: Kimi Drops Tiny But Mighty VLMs
The most impressive long form AI video paper dropped, showing that it's possible to create 1 minute long video, with incredible character and scene consistency TK: Video comparison of Tom & Jerry Scene This paper to a pre-trained transformer, allowing it to one shot generate these incredibly consistent long scenes.
- The most impressive long form AI video paper dropped, showing that it's possible to create 1 minute long video, with incredible character and scene consistency
- TK: Video comparison of Tom & Jerry Scene
- This paper to a pre-trained transformer, allowing it to one shot generate these incredibly consistent long scenes.
Hosts and Guests
Alex Volkov - AI Evangelist & Weights & Biases (@altryne)
Co Hosts - @WolframRvnwlf @yampeleg @nisten @ldjconfirmed
Michael Luo @michaelzluo - CS PhD @ UC Berkeley; AI & Systems
Liad Yosef (@liadyosef), Ido Salomon (@idosal1) - GitMCP creators
Open Source LLMs
Meta drops LLama 4 (Scout 109B/17BA & Maverick 400B/17BA) - (Blog, HF, Try It)
Together AI and Agentica (UC Berkley) announce DeepCoder-14B (X, Blog)
NVIDIA Nemotron Ultra is here! 253B pruned LLama 3-405B (X, HF)
Jina Reranker M0 - SOTA multimodal reranker model (Blog, HF)
DeepCogito - SOTA models 3-70B - beating DeepSeek 70B - (Blog, HF)
ByteDance new release - Seed-Thinking-v1.5
Big CO LLMs + APIs
Google announces TONS of new things ๐ (Blog)
Google launches Firebase Studio (website)
Google is announcing official support for MCP (X)
Google announces A2A protocol - agent 2 agent communication (Blog, Spec, W&B Blog)
Cloudflare - new Agents SDK (Website)
Anthropic MAX - $200/mo with more quota
Grok 3 finally launches API tier (API)
OPenAI adds enhanced memory to ChatGPT - can remember all your chats (X)
This weeks Buzz - MCP and A2A
W&B launches the observable.tools initiative & invite people to comment on the MCP RFC
W&B is the launch partner for Google's A2A (Blog)
Vision & Video
Voice & Audio
Amazon - Nova Sonic - speech2speech foundational model (Blog)
AI Art & Diffusion & 3D
HiDream-I1-Dev 17B MIT license new leading open weights image gen 0 passes Flux1.1[pro] ! (HF)
Tools
GitMCP - turn any github repo into an MCP server (try it)