AI-Generated Code: A Threat to Software Supply Chain

As organizations increasingly rely on large language models (LLMs) to accelerate development, a new study cautions that AI-generated code may introduce hidden vulnerabilities into the software supply chain. Researchers from the University of Texas at San Antonio analyzed 576,000 code samples generated by 16 of the most widely used LLMs and discovered that nearly 20% of the external package dependencies they declared were “hallucinated” — references to non-existent libraries. This gap between AI outputs and reality creates an opening for dependency confusion attacks, which can compromise everything from enterprise applications to critical infrastructure.
Package Hallucination and Dependency Confusion Explained
Every modern application depends on third-party libraries to speed development and avoid reinventing the wheel. But when an LLM invents a package name—say fastdata-utils@^2.1.5
—developers may inadvertently trust and install it. Malicious actors can register that same name on public package registries like npm or PyPI, publish a poisoned version, and exploit implicit version resolution rules to deliver payloads that exfiltrate data, install backdoors, or pivot deeper into corporate networks.
- Dependency confusion (or “package confusion”) was first demonstrated in 2021, when researchers tricked corporate networks at Apple, Microsoft, and Tesla by registering similarly named internal package identifiers on public registries.
- LLM-induced hallucinations amplify the risk: in the UTSA study, 440,445 hallucinated package references were identified across Python and JavaScript samples.
- 43% of those hallucinations recurred more than 10 times, making them persistent targets for attackers to seize and weaponize.
Key Research Findings
- Open-source LLMs such as CodeLlama and StarCoder had the highest hallucination rates, averaging nearly 22% of all dependencies.
- Commercial models (ChatGPT, GPT-4, Claude) averaged just over 5%, attributed to larger parameter counts (>10×) and extensive safety fine-tuning.
- JavaScript samples hallucinated at a rate of 21.3%, compared to 15.8% for Python, likely due to npm’s sprawling ecosystem of over 2.5 million packages vs. PyPI’s 350,000.
Expanding Attack Surface: Latest Incidents and Trends
Since the UTSA paper’s publication at USENIX Security 2025, security teams have observed real-world cases where AI-suggested imports led to unvetted package installs. In July 2025, Snyk reported that 24% of projects using AI code assistants introduced at least one missing dependency within their first week of adoption. In one high-profile case, an internal banking app inadvertently pulled in a backdoored npm module named secure-auth-token
, allowing attackers to capture OAuth credentials in transit.
Industry experts warn the problem is accelerating. Katie Moussouris, CEO of Luta Security, notes: “We’re entering an era where AI is our pair programmer. But without strict validation, those hallucinations become the weakest link in the chain. Malicious actors will always exploit trust blind spots.” Meanwhile, a joint advisory from CISA and the UK’s NCSC in September 2025 urged organizations to treat all AI-originated dependencies as untrusted until verified.
Technical Mitigations and Best Practices
- Adopt Software Bill of Materials (SBOM) standards such as CycloneDX or SPDX to catalog every dependency, including transient ones.
- Implement the SLSA (Supply-chain Levels for Software Artifacts) framework. Enforce at least SLSA level 2 to require signed source provenance, and move toward level 4 for hermetic builds and provenance attestation.
- Use Sigstore for transparent key management and automatic signature verification of packages at install time.
- Integrate static analysis and dependency-scanning tools (GitHub Dependabot, CodeQL, Snyk) into CI/CD pipelines to flag unregistered or newly published dependencies before merge.
- Employ in-IDE plugins that validate package existence via registry APIs in real time. Microsoft’s Visual Studio 2026 preview now includes a “Trusted Dependency” badge that cross-checks npm, PyPI, Maven Central, and internal feeds.
Future Outlook: Securing LLM-Assisted Development
LLM providers are racing to close this gap. OpenAI’s upcoming “Code Integrity” API will integrate registry lookups directly into model inference, preventing hallucinated imports. Anthropic is experimenting with retrieval-augmented generation (RAG) that anchors code completions to verified documentation. Meanwhile, research teams at MIT and Stanford are exploring formal verification layers atop LLM outputs to guarantee that suggested imports exist and meet semantic version policies.
For now, organizations must update their DevOps policies: mandate manual vetting of AI-suggested dependencies, enforce dependency pinning, and continuously monitor for new malicious registrations that match internal naming conventions. As Microsoft CTO Kevin Scott recently predicted, “95% of code will be AI-assisted within five years.” It’s imperative that we engineer the safeguards today to ensure that tomorrow’s AI-driven pipelines aren’t pipelines to compromise.