model evaluation

June 11, 20251380
Apple Study Fuels Debate on AI Reasoning Skills
In early June 2025, Apple’s AI research division published a landmark paper that rigorously ...

June 10, 20251280
AI Safety Evals and Biothreat Cyber Risk Reports Review
Published on June 9, 2025 1:00 PM GMT | Updated November 15, 2025 Major ...

May 1, 20251530
LM Arena Study: Benchmarking Bias and Testing Advantages Uncovered
The race to build ever-more capable large language models (LLMs) has spawned a plethora ...