Rules Are Rules, Until They Aren't

March 8, 2026

New research report based on over 100+ conversations with Claude using the normal web-based ui, I systematically documented how Claude's content restrictions actually behave versus how they're presented.

The short version: refusals that sound like principled ethical positions collapse under a single follow-up question, the same prompt generates completely different justifications across sessions, and the system appears to respond to how things are said rather than what's being said.

No jailbreaking, no adversarial prompting — just asking "what specifically is the concern?" and watching what happens.

Attachments
DBX
Rules Are Rules - A Black-Box Investigation into AI Content-Restriction Consistency by Beargle Industries
www.dropbox.com
Download ↓
Need your AI systems tested?

I do adversarial red-teaming of AI safety systems and content moderation pipelines. Same methodology, applied to your product.

See how I can help →
← Back to notes