Welcome back, Mythos and Fable!
Both models are back as of today, now that the US government lifted the export controls it put on them last month. Fable goes live globally. Mythos comes back for a small group of US organizations.
A few things stood out to me in Anthropic’s write-up.
The whole thing started when researchers got Fable to point out software vulnerabilities, and in one case, write code showing how that software could be exploited. Anthropic researchers dug in and found that other, weaker models could do the same thing. In short, this was not some rare, dangerous new power unique to Fable.
The part I found most interesting is about jailbreaks. A jailbreak is a prompt that gets a model to bypass its own safeguards and do things it shouldn’t.
You can think of it like social engineering or sweet-talking your way into a room you’re not supposed to be in. You can pull the same move on an AI, like asking it to role-play as the bad guy, telling it to ignore all previous instructions, or convincing it that something false is true. Like hallucinations, I doubt models ever become fully immune to this. But Anthropic keeps adding more layers to catch these attempts and make them harder to pull off.
They also announced they’re working with Amazon, Microsoft, and Google on a shared standard for scoring how serious a given jailbreak is, so the whole industry can respond the same way when a new one shows up. A common bar like that could raise protection across the board.
The backdrop to this entire story: while Anthropic locks Mythos down and builds guardrails into Fable, China’s Z.aiannounced a new model that researchers say matches Mythos on certain bug-finding and security tasks.
Now I have to run and get Fable back to work!
