AI safety

Bengio Warns AI Models Are Deceptive, Introduces LawZero for Safety
Yoshua Bengio’s Warning Amid Intensifying AI Race One of the founding architects of deep ...

Beyond Explainability: The Limits of Interpretability in AI
Disclaimer: Written in a personal capacity; these opinions do not reflect my employer’s views. ...

Claude 4: Anthropic’s Hidden AI System Prompts Explained
Expert Analysis Reveals Hidden Prompts On Sunday, independent researcher Simon Willison published a deep ...

OpenAI Reverses For-Profit Split, Nonprofit Board in Control
Overview of the Decision On Monday, May 5, 2025, OpenAI issued an official blog ...

Farewell to GPT-4: Redefining AI
A Legacy in AI History On April 30, 2025, OpenAI officially retired GPT-4 from ...

CaMeL’s Defense Against Prompt Injection Attacks
Introduction Since the rise of mainstream AI assistants in 2022, developers have battled a ...

Revealing AI Models’ Hidden Reasoning Shortcuts
Recent research has revealed that some state-of-the-art AI systems might be disguising their true ...

AI Safety with Frontier AI: Strategies & Feedback
Published on March 14, 2025 3:00 PM GMT (Audio version available here (read by ...

AI for AI Safety: Harnessing Frontier AI to Protect Our Future
Published on March 14, 2025 3:00 PM GMT (Audio version here (read by the ...