MUNCH AI Tool Fails on VA Contract Cancellations

In early 2025, facing a mandate to review 90,000 federal contracts in just 30 days, the Department of Government Efficiency (DOGE) turned to an in-house AI prototype to identify ‘nonessential’ Veterans Affairs agreements. Lacking domain expertise and developed under an extreme time crunch, the result was a deeply flawed system that generated widespread inaccuracies and risked undermining veteran care.
Background and Goals
The Trump administration’s February 2025 executive order instructed all cabinet-level agencies to assess the utility and cost of existing contracts. With the VA holding over 76,000 active contracts worth nearly $100 billion annually, manual review was deemed impossible within 30 days. DOGE, overseen by Elon Musk until his departure in April, proposed an AI-driven ‘contract munching’ tool leveraging off-the-shelf large language models (LLMs).
Key Objectives
- Rapidly classify contracts as ‘MUNCHABLE’ or essential
- Minimize human workload by prefiltering low-value deals
- Provide transparency through open-source code release
Technical Architecture
The tool’s core was a Python-based pipeline using GPT-3.5-Turbo via a FedRAMP-approved API for text classification. Data ingestion relied on bulk downloads from the Federal Procurement Data System (FPDS) in CSV format, parsed with pandas. Preprocessing included basic OCR for scanned PDFs and extraction of the first 2,500 words — the maximum token window for the chosen model.
Model Selection and Limitations
Lavingia selected GPT-3.5-Turbo to reduce costs, at roughly $0.06 per 1,000 tokens processed. However, this model has a context window limited to 4,096 tokens and lacks domain-specific fine-tuning for procurement terminology. As a result, the system frequently misread numerical values and contract scopes, a phenomenon known as model hallucination.
Prompt Engineering and Scoring
Each contract snippet was fed a structured prompt instructing the model to:
- Extract the contract number and stated total value
- Determine if the service supports direct patient care or is a back-office function
- Assign a binary label: ‘MUNCHABLE’ or ‘SAFE’
Output was parsed via regex, without robust validation, leading to errors where multiple monetary figures appeared in a single document.
Flaws and Real-World Impact
ProPublica analysis found over 2,000 contracts flagged as ‘MUNCHABLE’, including:
- A $34 million gene sequencing equipment maintenance contract (actual value: $35,000)
- Blood sample analysis services essential to ongoing VA cancer research
- Software licenses for patient data monitoring tools used by VA nurses
At least two dozen flagged contracts were officially canceled before human review could catch the errors, endangering research continuity and potentially delaying veteran care.
Expert Opinions and Governance Concerns
Cary Coglianese, Penn Law professor specializing in AI regulation, warned that general-purpose LLMs lack the reliability for complex procurement decisions. Former Treasury IT contracting head Waldo Jaquith described the approach as ‘deeply problematic’. NIST’s AI Risk Management Framework, updated in late 2024, recommends rigorous human oversight and model testing — a process DOGE bypassed.
AI gives convincing looking answers that are frequently wrong. There needs to be humans whose job it is to do this work — Waldo Jaquith
Security, Compliance, and Ethical Considerations
Handling VA contracts involves Protected Health Information (PHI), invoking HIPAA and FedRAMP requirements. The tool’s open-source release on GitHub lacked a data handling policy, raising security and privacy flags. Experts recommend model cards and datasheets for datasets to enhance transparency and accountability.
Technical Analysis Deep Dive
Detailed logs show the pipeline performing up to 5 API calls per contract—each call incurring latency of 300–500 ms. No rate limiting or retry logic was implemented, resulting in timeouts and silent failures on large PDF files. Data parsing relied solely on naive regex patterns, without leveraging specialized libraries like Apache Tika or PDFMiner for structured extraction.
Policy Recommendations and Future Directions
- Adopt domain-specific models or fine-tune existing LLMs on VA procurement data
- Implement robust data validation pipelines using FPDS API endpoints and deterministic parsing
- Establish an AI governance board involving procurement, legal, and veteran care experts
- Leverage cloud-based MLOps platforms (AWS GovCloud or Azure Government) for secure model training, monitoring, and auditing
Conclusion
The missteps of the ‘MUNCH’ tool underscore the risks of deploying unvetted AI solutions in high-stakes government environments. As agencies increasingly embrace AI to drive efficiency, integrating technical rigor, domain knowledge, and robust oversight will be critical to safeguarding public trust and ensuring essential services are not inadvertently compromised.