LLM Safety

8 hours ago240
Avoiding Anthropomorphic Pitfalls in AI Identity
Or: How anthropomorphic assumptions about AI identity might create confusion and suffering at scale ...

June 10, 2025940
Mitigating Reward Hacking in LLMs: Best Practices
Introduction and Project Scope We present a structured evaluation suite of four targeted scenarios ...

May 22, 20251450
Judge Lets Lawsuit on Google’s Chatbot Role Move Forward
In a landmark decision on May 22, 2025, U.S. District Judge Anne Conway declined ...