FDA’s Elsa LLM Rollout Reveals Security and Accuracy Gaps

Introduction: A High-Profile, Accelerated Deployment
In early June 2025, the U.S. Food and Drug Administration (FDA) unveiled Elsa, a custom large language model (LLM) designed to serve every center—from drug review teams to device safety investigators. Launched weeks ahead of schedule under significant cost pressures, Elsa promised to expedite clinical protocol reviews, automate adverse-event summarization, and even generate code for database development. Yet within days, staff reported erroneous outputs, integration failures, and insufficient governance, raising broader questions about AI readiness in high-stakes regulatory contexts.
Background: From CDER-GPT to Agency-Wide Elsa
Originally, the FDA’s Center for Drug Evaluation and Research (CDER) piloted CDER-GPT, a sector-specific AI assistant built atop Anthropic’s Claude 2. Since 2020, consulting giant Deloitte has ingested some 1.2 billion tokens of internal FDA protocols, labeling, and inspection reports under a $13.8 million contract to train Elsa. In April 2025, a follow-on $14.7 million award expanded development, rebranding CDER-GPT as Elsa and migrating the service into Amazon Web Services GovCloud (FIPS 140-2 / FedRAMP High) to meet federal security mandates.
Rushed Deployment and Performance Issues
- Inaccurate Summaries: NBC News reported that Elsa mischaracterized approved drug indications and adverse events in internal tests.
- Integration Failures: The Center for Devices and Radiological Health noted Elsa struggled to ingest PDFs and refused to connect with FDA’s secure data lake, forcing manual uploads.
- Overhyped Capabilities: Staff told Stat News that executives—including FDA Commissioner Marty Makary and the Department of Government Efficiency (DOGE)—overpromised Elsa’s scientific review potential.
“Makary and DOGE think AI can replace staff and cut review times, but it decidedly cannot,” a senior reviewer said. “We lack proper guardrails, and policy work is trailing the rollout.”
Technical Architecture and Integration Challenges
- Model Base: Elsa is derived from Anthropic’s Claude 2, fine-tuned on FDA’s proprietary corpus using Deloitte’s custom pipelines.
- Infrastructure: Deployed on AWS GovCloud (Region us-gov-west-1) under a VPC with segmented subnets, encrypted at rest (AES-256) and in transit (TLS 1.3).
- Access Controls: Role-based IAM policies govern data access, but staff report permission inconsistencies and delays provisioning new users.
- Audit & Logging: CloudTrail and GuardDuty monitor usage, yet no automated red-flag alerts exist for hallucinations or off-policy queries.
These technical gaps highlight the tension between agile deployment and enterprise-grade stability in regulated environments.
Security and Compliance Considerations
Elsa’s hosting within GovCloud ostensibly meets FedRAMP High requirements, but emerging NIST SP 800-53 Rev. 5 guidelines recommend continuous monitoring for AI systems, including adversarial resilience and data provenance audits. In July 2025, the Office of Management and Budget (OMB) issued Memo M-25-10, requiring agencies to implement Model Supply-Chain Bill of Materials (SBOM) for all AI deployments—a mandate FDA has yet to fully adopt for Elsa’s dependencies.
Stakeholder Response and Policy Implications
AI governance experts warn that Elsa’s premature release could erode public trust just as the White House ramps up scrutiny of “high-risk” AI under Executive Order 14110. Congressional oversight committees have already scheduled hearings to examine whether Elsa displaced critical scientific labor or introduced unacceptable safety risks.
Expert Opinions
- Dr. Alicia Reed, AI Risk Consultant: “Without a comprehensive risk-management framework, hallucinations in medical contexts can trigger false positives or conceal adverse signals.”
- Jonathan Park, Federal Cloud Architect: “GovCloud is secure, but agencies must invest in real-time monitoring and incident response to handle AI-specific threats.”
Looking Ahead: Maturing AI in Regulatory Workflows
To regain momentum, the FDA is now:
- Revisiting Elsa’s training data curation with updated medical ontologies (MeSH, SNOMED CT).
- Implementing a human-in-the-loop (HITL) review for all high-risk outputs.
- Collaborating with NIST on the upcoming AI Risk Management Framework (AI RMF v1.0).
As federal AI adoption accelerates, Elsa’s rocky launch underscores the importance of balanced, security-first integration and robust policy guardrails.