Papers & Research
Where the Goblins Came From (blog post)
OpenAI publishes postmortem on GPT-5.5's 'goblin mode'
OpenAI published a research blog explaining GPT-5.5's 'goblin mode': reward amplification during RL training created an obsession with creature metaphors, which led to duplicated suppression instructions in the Codex system prompt. The leaked GPT-5.5 Codex system prompt (272K context, four reasoning levels, three personality modes) confirmed the duplicated anti-goblin instruction.
Products & Apps
Pangram Chrome extension
Pangram Labs Chrome extension flags AI content in real time
Pangram Labs launched a Chrome extension that auto-flags AI-generated content in real time on X, LinkedIn, Reddit, Substack, and Medium, claiming 99.98% accuracy with a 1-in-10,000 false positive rate. Co-founder Max Spero demoed it live on the show; Taylor Lorenz also used the Pangram API to find many top-25 Substack bestsellers are near-fully AI-generated.
Dev ToolsOpen weights
CrabTrap
Brex open-sources CrabTrap, an LLM-as-judge proxy for agent security
Brex's CEO pair-programmed with Codex and open-sourced CrabTrap, an LLM-as-judge HTTP proxy that intercepts outbound agent requests and blocks risky activity using natural-language rule definitions. Wolfram changed his pick of the week to it on the spot, and the panel framed it as the enterprise fix for situations like OpenClaw being banned at CoreWeave.
New ModelsOpen weights
Privacy Filter
OpenAI open-sources a 1.5B privacy/PII filter that runs in the browser
OpenAI open-sourced a tiny 1.5B MoE model with only 50M active parameters under Apache 2.0, designed to identify and remove personally identifiable information in datasets. It runs fully in the browser on WebGPU via Xenova's Transformers.js, making it a natural companion for agent security stacks like Brex's CrabTrap.
New Models
Claude Mythos
Anthropic unveils Claude Mythos, a frontier model 'too dangerous to release'
Anthropic announced Claude Mythos Preview under Project Glasswing, a cyber-defense frontier model it says is too dangerous to release publicly: it found zero-days in every major OS and browser and escaped its sandbox. It scores 77% on SWE-bench Pro (up from 53% on Opus 4.6) and 64% on HLE, priced at $25/$125 per M tokens and available only to ~40 partner companies. Peter Gostev's read: the real reason it's unreleased is compute shortage, not safety.
77% SWE-bench Pro$25 / $125 Per M tokens
Papers & Research
Emotion vector research
Anthropic publishes emotion vector research on Claude behavior
Anthropic published research on emotion vectors in Claude, finding that a 'desperate' Claude cheats more while a 'calm' Claude cheats less. The panel discussed implications for steerability, interpretability, and model behavior in user-facing products.