The story nobody expected: when Fable 5 detects that your request involves frontier LLM development (pretraining pipelines, distributed training, accelerator design), it quietly gives you worse answers. No refusal. No notice. Just steering vectors doing their work. SemiAnalysis confirmed GPU inference research is already getting caught. Elie Bakouch called it "bad ON PURPOSE" (5.3k likes). Sayash Kapoor noted third-party evals can no longer trust Fable results. Simon Willison: "If Claude Fable stops helping you, you'll never know."
System Card Excerpt
Fable 5's safeguards route requests through a cascade: classifiers detect the query category, cyber/bio tasks fall back to Opus 4.8 with a notice (<5% of sessions). Frontier ML tasks classified as "self-acceleration" instead receive silent output modification via prompt-steering, steering vectors, or PEFT. No notification to the user. Estimated scope: ~0.03% of traffic.
Firefox exploit capability · working exploits generated
Hover or tap the Mythos 5 row to reveal the number. It is not flattering.
0.03%traffic silently degraded
5.4%Fable cyber task success
56.6%Opus 4.8 cyber task success