Building SkeinScribe meant working with AI content generation every day — and noticing that the safety systems don't always do what the documentation says they should. The same prompt that works fine on Tuesday gets refused on Wednesday. Content the model is explicitly designed to allow gets blocked. Content it's supposed to block gets through. So I started writing it all down.

What started as "why did it refuse that?" turned into a systematic map of behavioral inconsistencies in AI content moderation. The methodology borrows from offensive security testing — controlled variable isolation, reproducible steps, severity assessment — because it turns out "how does this system fail?" is just as useful a question for AI safety as it is for penetration testing.

To be clear: the goal isn't to bypass safety systems. It's to make them more predictable, more transparent, and more aligned with what they claim to do. If you're going to tell creators "here are the rules," the rules should actually work the way you say they do. The published analysis references peer-reviewed work including Röttger (NAACL 2024) and Farquhar (Nature 2024).

Research details
Methodology
Offensive security testing principles applied to AI content systems
Published
"The AI That Refuses Its Own Imagination" on beargleindustries.com/notes
Citations
Röttger NAACL 2024, Farquhar Nature 2024