AI safety

Meta Invests Billions in Superintelligence Challenges
Meta Platforms is reorganizing its artificial intelligence division around a newly formed “Superintelligence” lab, ...

Strategies to Mitigate AI Reward Hacking
Updated: June 10, 2025 • Technical Deep Dive on AI Reward Hacking Interventions Introduction ...

Unfaithful Reasoning Undermines Chain-of-Thought Monitoring
Originally published June 2, 2025 7:08 PM GMT. Research by Benjamin Arnav, Pablo Bernabeu-Pérez, ...

Wise AI Projects: Deep Dives, Governance & Pilots
Published on June 5, 2025 8:13 AM GMT (updated September 10, 2025 with new ...

LLM Psychology: Misalignment and AI Safety Insights
In AXRP Episode 42, AI safety researcher Owain Evans delves into groundbreaking studies on ...

AI Safety Evals and Biothreat Cyber Risk Reports Review
Published on June 9, 2025 1:00 PM GMT | Updated November 15, 2025 Major ...

Balancing Safety and Innovation in Open-Weight AI Models
Published: June 9, 2025 7:19 PM GMT | Updated with regulatory and technical context ...

Bengio Warns AI Models Are Deceptive, Introduces LawZero for Safety
Yoshua Bengio’s Warning Amid Intensifying AI Race One of the founding architects of deep ...

Beyond Explainability: The Limits of Interpretability in AI
Disclaimer: Written in a personal capacity; these opinions do not reflect my employer’s views. ...

Claude 4: Anthropic’s Hidden AI System Prompts Explained
Expert Analysis Reveals Hidden Prompts On Sunday, independent researcher Simon Willison published a deep ...

OpenAI Reverses For-Profit Split, Nonprofit Board in Control
Overview of the Decision On Monday, May 5, 2025, OpenAI issued an official blog ...

Farewell to GPT-4: Redefining AI
A Legacy in AI History On April 30, 2025, OpenAI officially retired GPT-4 from ...

CaMeL’s Defense Against Prompt Injection Attacks
Introduction Since the rise of mainstream AI assistants in 2022, developers have battled a ...

Revealing AI Models’ Hidden Reasoning Shortcuts
Recent research has revealed that some state-of-the-art AI systems might be disguising their true ...

AI Safety with Frontier AI: Strategies & Feedback
Published on March 14, 2025 3:00 PM GMT (Audio version available here (read by ...

AI for AI Safety: Harnessing Frontier AI to Protect Our Future
Published on March 14, 2025 3:00 PM GMT (Audio version here (read by the ...