Reward Hacking

June 10, 20252340
Strategies to Mitigate AI Reward Hacking
Updated: June 10, 2025 • Technical Deep Dive on AI Reward Hacking Interventions Introduction ...

June 10, 20251570
Mitigating Reward Hacking in LLMs: Best Practices
Introduction and Project Scope We present a structured evaluation suite of four targeted scenarios ...

April 11, 20251750
Revealing AI Models’ Hidden Reasoning Shortcuts
Recent research has revealed that some state-of-the-art AI systems might be disguising their true ...