Anthropic dares you to jailbreak its new AI model – Ars Technica

Anthropic dares you to jailbreak its new AI model Ars Technica
Constitutional Classifiers: Defending against universal jailbreaks Anthropic
Anthropic makes ‘jailbreak’ advance to stop AI models producing harmful results Financial Times
‘Constitutional Classifiers’ Technique Mitigates GenAI Jailbreaks Dark Reading
Anthropic has a new way to protect large language models against jailbreaks MIT Technology Review