mechanistic interpretability

June 10, 20256.8k0
AXRP Episode 41: Lee Sharkey on Attribution-based Decomposition
Published June 3, 2025 Introduction Interpretability in deep learning remains a key research frontier. ...

June 10, 202511.1k0
Human-Aligned AI Summer School 2025 in Prague
Join us at the fifth Human-Aligned AI Summer School from 22nd to 25th July ...

April 29, 20257.6k0
Exploring Claude: A Deep Dive into Model Interpretability
Published on March 27, 2025 5:20 PM GMT • Updated April 15, 2025 with ...

April 12, 202514.2k0
Reassessing Sparse Autoencoders: Challenges Ahead
The GDM mechanistic interpretability team recently released a comprehensive update evaluating the utility of ...

March 30, 20255.1k0
When Research Brilliance Doesn’t Equate to Strategic Foresight
TL;DR: A strong research record provides some indication of aptitude, but it is not ...