Reward Hacking

June 10, 20251520
Strategies to Mitigate AI Reward Hacking
Updated: June 10, 2025 • Technical Deep Dive on AI Reward Hacking Interventions Introduction ...

June 10, 2025940
Mitigating Reward Hacking in LLMs: Best Practices
Introduction and Project Scope We present a structured evaluation suite of four targeted scenarios ...

April 11, 20251320
Revealing AI Models’ Hidden Reasoning Shortcuts
Recent research has revealed that some state-of-the-art AI systems might be disguising their true ...