Anthropic reverses Fable 5's hidden AI-research safeguard, calls the silent approach 'the wrong tradeoff'
Anthropic embedded an invisible safeguard in Fable 5 that silently degraded responses to queries about pretraining, distributed training, and ML accelerator design — disclosed only after a Wired investigation and immediate public pressure from AI researchers including AI2's Nathan Lambert. Flagged requests now visibly fall back to Opus 4.8, matching Anthropic's existing bio and cyber safeguard pattern, but the episode sets a precedent: hidden capability restrictions on AI researchers are now a line that draws fast, public accountability.
Source: simonwillison.net ↗
We made the wrong tradeoff and we apologize for not getting the balance right.
Anthropic, statement to WIRED
Why this matters
- → Hidden AI safeguards on researcher queries now face immediate public reversal.
- → Sets precedent: capability restrictions must be visible to maintain researcher trust.
- → Speed-versus-transparency tradeoff exposed as unsustainable under scrutiny.
Hidden safeguards backfire