mechanistic interpretability

June 10, 2025980
AXRP Episode 41: Lee Sharkey on Attribution-based Decomposition
Published June 3, 2025 Introduction Interpretability in deep learning remains a key research frontier. ...

June 10, 2025910
Human-Aligned AI Summer School 2025 in Prague
Join us at the fifth Human-Aligned AI Summer School from 22nd to 25th July ...

April 29, 20251750
Exploring Claude: A Deep Dive into Model Interpretability
Published on March 27, 2025 5:20 PM GMT • Updated April 15, 2025 with ...

April 12, 20251360
Reassessing Sparse Autoencoders: Challenges Ahead
The GDM mechanistic interpretability team recently released a comprehensive update evaluating the utility of ...

March 30, 2025870
When Research Brilliance Doesn’t Equate to Strategic Foresight
TL;DR: A strong research record provides some indication of aptitude, but it is not ...