mechanistic interpretability

June 10, 20252970
			AXRP Episode 41: Lee Sharkey on Attribution-based Decomposition
Published June 3, 2025 Introduction Interpretability in deep learning remains a key research frontier. ...

June 10, 20253170
			Human-Aligned AI Summer School 2025 in Prague
Join us at the fifth Human-Aligned AI Summer School from 22nd to 25th July ...

April 29, 20254550
			Exploring Claude: A Deep Dive into Model Interpretability
Published on March 27, 2025 5:20 PM GMT • Updated April 15, 2025 with ...

April 12, 20252960
			Reassessing Sparse Autoencoders: Challenges Ahead
The GDM mechanistic interpretability team recently released a comprehensive update evaluating the utility of ...

March 30, 20251840
			When Research Brilliance Doesn’t Equate to Strategic Foresight
TL;DR: A strong research record provides some indication of aptitude, but it is not ...