Generative AI’s Productivity Trade-Off: Danish Study Reveals Costs

Overview of the Danish AI Productivity Study
In a new working paper titled Large Language Models, Small Labor Market Effects, economists Anders Humlum (University of Copenhagen) and Emilie Vestergaard (University of Chicago) present an empirical analysis of generative AI’s impact on wages, hours worked, and task composition in Denmark during 2023–2024. By matching administrative payroll records for 25,000 employees across 7,000 private-sector firms—spanning 11 occupations from accounting and software development to teaching and customer support—the study offers one of the first large-scale, real-world evaluations of AI chatbots like ChatGPT (GPT-3.5/GPT-4) and Claude in enterprise workflows.
Key findings include:
- AI tools adopted by up to 90% of workers in exposed roles, enabled via corporate subscriptions to services such as Azure OpenAI, AWS Bedrock, and Google Vertex AI.
- Average time savings of only ~2.8% per user (≈1 hour/week), as measured through self-reports and system logs of token counts and API call durations.
- Creation of new tasks (prompt crafting, AI output verification, plagiarism detection) for 8.4% of workers—offsetting most of the realized efficiency gains.
- No statistically significant effect on hourly wages or total hours worked, with 95% confidence intervals ruling out average changes exceeding ±1%.
Technical Underpinnings of Generative AI Deployment
Most firms in the sample integrated large language models (LLMs) via cloud APIs or on-premises Docker containers orchestrated by Kubernetes. Typical implementation patterns include:
- Embedding-based semantic search: using 1,536-dimensional vectors via OpenAI’s
text-embedding-ada-002
to index internal documentation. - Prompt pipelines: multi-stage prompt engineering workflows where initial zero-shot queries are followed by chain-of-thought and self-critique stages to improve factuality.
- Fine-tuning and retrieval augmentation: local RAG (Retrieval Augmented Generation) systems combining elasticsearch clusters with LLM inference nodes (e.g., NVIDIA A100 GPUs on Azure ML).
Despite these advanced setups, Humlum notes that latency (average 800ms per 2,048-token generation), cost ($0.06 per 1,000 tokens for GPT-4), and model hallucinations still limit real-world productivity boosts. Moreover, the heterogeneity of tasks—many falling outside narrow, well-defined promptable domains—means only a fraction of daily work is amenable to full or partial automation.
New Tasks Emergent from AI Adoption
Contrary to early pilot studies—such as the February randomized controlled trial reporting 15% average productivity gains—this large-scale survey finds:
- Prompt engineering: Employees spent 5–10 minutes extra per prompt iterating temperature, max tokens, and system messages to reduce errors.
- Output validation: Quality assurance checks for factual consistency and compliance, often leveraging internal audit frameworks or third-party detectors (e.g., GPTZero, Turnitin’s AI-writing assessment).
- Supervisory overhead: Managers tailored new performance metrics around “AI-enhanced outputs,” requiring training sessions and dashboard integration via Power BI or Tableau.
Implications for Workforce Skill Evolution
As organizations integrate LLMs into core systems—from Salesforce Copilot to GitHub Copilot Pro—new roles and skill sets are emerging. Industry reports (McKinsey, O’Reilly) estimate that by 2026 up to 20% of data analysis and document-drafting tasks will be reallocated to AI-ops specialists, prompt engineers, and AI ethicists. Key competencies include:
- SQL and Python scripting for data preprocessing and API orchestration.
- Knowledge of vector databases (Pinecone, Weaviate) and graph embeddings for advanced retrieval workflows.
- Understanding of LLM risk management: bias mitigation, adversarial prompt defense, and compliance with GDPR/CCPA.
Global Comparative Context
While the Danish study offers a rigorous snapshot of early adoption, its findings may diverge from other economies due to:
- Labor market flexibility: Denmark’s strong worker protections and collective bargaining could blunt wage-pressure effects.
- Sectoral composition: Heavy representation of public-sector–adjacent roles (education, healthcare administration) where AI integration is slower and less standardized.
- Alternate case studies: In the U.S. tech sector, GitHub’s internal analysis reported up to 50% faster pull-request turnaround with Copilot—highlighting the gap between controlled environments and broad enterprise settings.
Future Research Directions
The authors emphasize the preliminary nature of their results and call for:
- Longitudinal studies capturing second- and third-wave AI tool rollouts—especially systems embedded directly into ERP and CRM workflows.
- Cross-country comparisons with different regulatory regimes and digital infrastructure maturity, such as South Korea’s AI-friendly policies or Germany’s Industrie 4.0 initiatives.
- Task-level analyses using time-motion studies and system telemetry to precisely measure deep work versus shallow work shifts.
Given the rapid evolution of generative AI—illustrated by the recent release of GPT-4 Turbo, Meta’s Llama 3, and Anthropic’s Claude 3—this Danish study offers a valuable early readout but is unlikely to be the final word on AI’s macroeconomic impact.
Selected References & Expert Opinions
- Humlum, A. & Vestergaard, E. (2025). Large Language Models, Small Labor Market Effects. University of Chicago Coase-Sandor Institute working paper.
- McKinsey Global Institute (2023). The State of AI in 2023: Generative Adoption, Opportunities, and Risks.
- Ng, A. (2024). Commentary on AI tool integration in enterprise workflows. Stanford Human-Centered AI Initiative.