AI safety

AI Video Generator Sparks Concerns Over Content Safety Controls image

August 5, 20256860

AI Video Generator Sparks Concerns Over Content Safety Controls

Overview of the Malfunction Just weeks after a high-profile antisemitic meltdown, xAI s AI ...

Subliminal Learning in LLMs: Unseen Behavioral Traits image

August 2, 20253.1k1

Subliminal Learning in LLMs: Unseen Behavioral Traits

Distillation—the process of training a compact “student” model to imitate a larger “teacher” model—has ...

Yaccarino Leaves X After Grok AI Backlash: Analysis image

July 9, 20258.6k0

Yaccarino Leaves X After Grok AI Backlash: Analysis

"Opportunity of a lifetime" – After two turbulent years leading X Corp., former NBCUniversal ...

Foom & Doom 1: Brain-in-a-Box Revisited image

July 7, 202514.4k0

Foom & Doom 1: Brain-in-a-Box Revisited

This article is the first installment in a two-part series on AI “foom” (this ...

Meta Invests Billions in Superintelligence Challenges image

June 10, 20256.2k0

Meta Invests Billions in Superintelligence Challenges

Meta Platforms is reorganizing its artificial intelligence division around a newly formed “Superintelligence” lab, ...

Strategies to Mitigate AI Reward Hacking image

June 10, 20254.6k0

Strategies to Mitigate AI Reward Hacking

Updated: June 10, 2025 • Technical Deep Dive on AI Reward Hacking Interventions Introduction ...

Unfaithful Reasoning Undermines Chain-of-Thought Monitoring image

June 10, 20252.6k0

Unfaithful Reasoning Undermines Chain-of-Thought Monitoring

Originally published June 2, 2025 7:08 PM GMT. Research by Benjamin Arnav, Pablo Bernabeu-Pérez, ...

Wise AI Projects: Deep Dives, Governance & Pilots image

June 10, 202510.3k0

Wise AI Projects: Deep Dives, Governance & Pilots

Published on June 5, 2025 8:13 AM GMT (updated September 10, 2025 with new ...

LLM Psychology: Misalignment and AI Safety Insights image

June 10, 20254.3k0

LLM Psychology: Misalignment and AI Safety Insights

In AXRP Episode 42, AI safety researcher Owain Evans delves into groundbreaking studies on ...

AI Safety Evals and Biothreat Cyber Risk Reports Review image

June 10, 202515.1k0

AI Safety Evals and Biothreat Cyber Risk Reports Review

Published on June 9, 2025 1:00 PM GMT | Updated November 15, 2025 Major ...

Balancing Safety and Innovation in Open-Weight AI Models image

June 10, 20257.7k0

Balancing Safety and Innovation in Open-Weight AI Models

Published: June 9, 2025 7:19 PM GMT | Updated with regulatory and technical context ...

Bengio Warns AI Models Are Deceptive, Introduces LawZero for Safety image

June 3, 20252k0

Bengio Warns AI Models Are Deceptive, Introduces LawZero for Safety

Yoshua Bengio’s Warning Amid Intensifying AI Race One of the founding architects of deep ...

Beyond Explainability: The Limits of Interpretability in AI image

May 28, 20255.8k0

Beyond Explainability: The Limits of Interpretability in AI

Disclaimer: Written in a personal capacity; these opinions do not reflect my employer’s views. ...

Claude 4: Anthropic’s Hidden AI System Prompts Explained image

May 27, 202513.9k0

Claude 4: Anthropic’s Hidden AI System Prompts Explained

Expert Analysis Reveals Hidden Prompts On Sunday, independent researcher Simon Willison published a deep ...

OpenAI Reverses For-Profit Split, Nonprofit Board in Control image

May 5, 20258.8k0

OpenAI Reverses For-Profit Split, Nonprofit Board in Control

Overview of the Decision On Monday, May 5, 2025, OpenAI issued an official blog ...

Farewell to GPT-4: Redefining AI image

April 30, 20254.1k0

Farewell to GPT-4: Redefining AI

A Legacy in AI History On April 30, 2025, OpenAI officially retired GPT-4 from ...

CaMeL’s Defense Against Prompt Injection Attacks image

April 16, 202511.6k0

CaMeL’s Defense Against Prompt Injection Attacks

Introduction Since the rise of mainstream AI assistants in 2022, developers have battled a ...

Revealing AI Models’ Hidden Reasoning Shortcuts image

April 11, 202510.7k0

Revealing AI Models’ Hidden Reasoning Shortcuts

Recent research has revealed that some state-of-the-art AI systems might be disguising their true ...

AI Safety with Frontier AI: Strategies & Feedback image

March 27, 20253.8k0

AI Safety with Frontier AI: Strategies & Feedback

Published on March 14, 2025 3:00 PM GMT (Audio version available here (read by ...

AI for AI Safety: Harnessing Frontier AI to Protect Our Future image

March 27, 20251.8k0

AI for AI Safety: Harnessing Frontier AI to Protect Our Future

Published on March 14, 2025 3:00 PM GMT (Audio version here (read by the ...

© 2026 Web Crafting Code