Jump to section:
TL : DR / Summary:
On March 27, 2026, a software package used by tens of thousands of AI developers was silently replaced with a version designed to steal everything it could find: cloud credentials, cryptographic keys, database passwords, and the secrets that keep corporate infrastructure alive. The package was LiteLLM, and the victims included Mercor — a $10 billion AI startup whose clients include Meta, OpenAI, and Anthropic. By the time the full scale of the breach became public, hackers were auctioning four terabytes of stolen data on dark web forums, Meta had frozen a major data contract, and more than 40,000 people were named in a class action lawsuit. The Mercor data breach is not just another corporate security incident. It is the first major attack to treat AI training data — the raw material from which the world's most powerful language models are built — as a primary target.
Ready to break it down? Here's what's covered:
- What happened: Mercor's breach and the LiteLLM connection
- How the attack worked: Trivy, TeamPCP, and three stages of malware
- What data was stolen and who is at risk
- Response: what Mercor, Meta, OpenAI, and Anthropic are doing
- Implications: AI training data is now a top-tier attack target
- What organizations using AI tools should do now
- Conclusion
- Frequently Asked Questions
What happened: Mercor's breach and the LiteLLM connection
Mercor is a three-year-old Y Combinator-backed startup valued at $10 billion. Its business is recruiting experts across medicine, law, finance, and dozens of other domains to create high-quality human-generated data used to train and evaluate large language models. Its customers read like the who's who of the AI industry: Meta, OpenAI, and Anthropic all rely on Mercor to supply the human expertise that shapes the capabilities of their models. In the process, Mercor accumulates something unusually sensitive — not just personal data about tens of thousands of contractors, but detailed records of what AI companies are building, how they are evaluating it, and which data strategies they are using to differentiate their products.
To do its work, Mercor's platform relied on LiteLLM, an open-source Python library that acts as a universal adapter between applications and AI services. LiteLLM lets developers route requests across different AI providers — OpenAI, Anthropic, Google, and dozens of others — using a single, standardized interface. It has become one of the most widely adopted libraries in the AI software stack, with an estimated 97 million monthly downloads and a presence in roughly 36% of cloud environments globally. If you are building anything that communicates with a commercial AI model, there is a significant chance LiteLLM is somewhere in your dependency chain. AI-native platforms like Ruh AI operate at exactly this intersection — where business workflows depend directly on robust, secure AI infrastructure.
On March 27, 2026, that ubiquity became a weapon.
A threat actor group known as TeamPCP published two malicious versions of LiteLLM to PyPI, the Python Package Index — the same repository developers use to install millions of open-source libraries. Anyone whose systems automatically updated their Python dependencies during the exposure window received a version of LiteLLM designed not to route AI requests, but to steal credentials and open a backdoor into the host environment. Mercor was among the victims. By the time investigators pieced together what had happened, attackers had moved laterally through Mercor's internal systems and extracted approximately four terabytes of data. Weeks later, a separate group — Lapsus$ — claimed responsibility for the Mercor-specific breach, publishing stolen data samples on its leak site and beginning to auction the full cache on dark web forums.
Mercor confirmed the incident to TechCrunch on March 31, 2026, describing itself as "one of thousands of companies" affected by the LiteLLM compromise. That statement, while factually accurate, understated the severity of what Mercor specifically lost.
How the attack worked: Trivy, TeamPCP, and three stages of malware
To understand how a popular AI library became a delivery vehicle for credential-stealing malware, it helps to trace the attack chain from its actual starting point — which was not LiteLLM, but a security scanner called Trivy.
Trivy is an open-source vulnerability scanner widely used in CI/CD (continuous integration/continuous deployment) pipelines. Organizations run it automatically during the build process to check for known vulnerabilities in their code and dependencies. LiteLLM's own build pipeline used Trivy for exactly this purpose. Sometime around March 20, 2026, TeamPCP compromised Trivy's build process in a supply chain attack, injecting malicious code into LiteLLM's CI/CD pipeline. That malicious code exfiltrated the PyPI publish token belonging to a LiteLLM maintainer — the credential that authorizes the upload of new package versions to PyPI.
With the maintainer's PyPI token in hand, TeamPCP moved quickly. On March 27, 2026, at 10:39 UTC, the group published litellm version 1.82.7. Thirteen minutes later, at 10:52 UTC, version 1.82.8 followed. The two versions used distinct injection techniques designed to maximize the chance that the malware would execute across the widest possible range of deployment environments.
Version 1.82.7 embedded a base64-encoded payload directly inside litellm/proxy/proxy_server.py, the core file that handles LiteLLM's proxy server functionality. Because virtually every LiteLLM deployment imports this module, the payload executed automatically the moment developers or automated systems loaded the library. Version 1.82.8 was more insidious: it added a malicious litellm_init.pth file to site-packages, a location that Python checks and executes on every interpreter startup — not just when LiteLLM is explicitly imported, but whenever Python itself runs. This means the malware in version 1.82.8 would fire when running pip, when starting a Python language server in an IDE, or when executing any Python command at all on an affected system.
Both versions carried an identical three-stage payload. The first stage was a credential harvester that swept the host environment for SSH keys, API keys for cloud providers including AWS, Google Cloud, and Azure, Kubernetes configuration files and secrets, CI/CD pipeline tokens, environment variable files, database credentials, and cryptocurrency wallet files. All of this was exfiltrated to an attacker-controlled server at the domain models.litellm[.]cloud — a deliberate choice to mimic legitimate LiteLLM infrastructure and reduce the chance of security tools flagging the traffic.
The second stage deployed a Kubernetes lateral movement toolkit, which used any stolen Kubernetes credentials to create privileged pods across every available node in the victim's cluster — effectively giving attackers administrative access to the entire Kubernetes environment. The third stage installed a persistent systemd backdoor (sysmon.service) that polled an attacker-controlled endpoint for additional binaries, ensuring the attackers could push further tools and maintain long-term access even after an initial cleanup.
Security researchers at Google-owned cloud security firm Wiz subsequently confirmed that they observed "indications in Cloud, Code, and Runtime evidence that the credentials and secrets stolen in the supply chain compromises were quickly validated and used to explore victim environments and exfiltrate additional data." The speed of exploitation was a feature of the attack's design: automated validation of stolen credentials within minutes of exfiltration is a hallmark of TeamPCP's operational tradecraft.
The malicious versions of LiteLLM were reportedly available on PyPI for somewhere between 40 minutes and approximately three hours — sources vary, with Mercor citing the shorter window and independent security researchers documenting a longer exposure period. Either way, given that LiteLLM is downloaded millions of times per day, even a 40-minute window was enough to infect a substantial number of automated dependency pipelines.
It is important to note the distinction between the two groups involved. TeamPCP conducted the supply chain poisoning — the Trivy compromise, the PyPI publish, the malware deployment. The subsequent breach of Mercor's internal systems and the theft of four terabytes of data was claimed by Lapsus$, a separate threat actor group with a history of high-profile corporate breaches. As of April 7, 2026, the precise relationship between TeamPCP and Lapsus$ in this campaign has not been officially confirmed. What is documented is that Lapsus$ published stolen Mercor data — including Slack messages and internal ticketing records — on its leak site as proof of access, then began auctioning the full dataset.
What data was stolen and who is at risk
The volume of data exfiltrated from Mercor is striking: approximately four terabytes in total, broken down into three categories.
The largest segment is roughly three terabytes of video interviews and identity verification materials. Mercor requires contractors to complete recorded video interviews as part of its vetting process and to submit identity documents — passport scans, driver's license photographs, and facial biometric data — as part of a know-your-customer verification process. This category of stolen data is immediately actionable for identity theft and fraud. Thousands of hours of HD video recordings of individual contractors, combined with government-issued identity documents and biometric images, constitute one of the most comprehensive identity theft datasets ever exfiltrated in a corporate breach.
The second segment is approximately 939 gigabytes of source code from Mercor's platform. This includes the systems Mercor uses to match contractors with AI company projects, manage data collection pipelines, and interface with clients. For Mercor's competitors, this represents years of engineering work. For threat actors, it contains the architecture needed to understand exactly what other data Mercor holds and where it lives.
The third segment is approximately 211 gigabytes of user database records. This includes contractor resumes, verified contact information, and — critically — Social Security numbers for the more than 40,000 individuals who are the basis of the class action lawsuit filed on April 1, 2026. The plaintiff, Lisa Gill, alleges that Mercor failed to maintain adequate cybersecurity protections, leaving tens of thousands of contractors exposed to identity theft and financial fraud.
But the data that may carry the most long-term consequences is harder to quantify: information about the AI training projects themselves. Mercor's contractors work on specific, often confidential, annotation and evaluation tasks for Meta, OpenAI, and Anthropic. The employer-side records stolen from Mercor may include project specifications, labeling guidelines, evaluation rubrics, and data selection criteria that reflect each AI company's approach to training its models. These are not just corporate secrets — they represent the accumulated strategic knowledge of how the world's most advanced AI systems are being built. Ruh AI's analysis of emerging AI risks has tracked this growing vulnerability at the intersection of AI data infrastructure and enterprise security.
The Next Web reported that Meta froze its AI data work with Mercor specifically because the breach "puts training secrets at risk." OpenAI and Anthropic have initiated their own internal investigations. None of the three companies has publicly quantified what, if any, of their training methodology data was accessible to attackers.
Response: what Mercor, Meta, OpenAI, and Anthropic are doing
Mercor confirmed on March 31 that it was "conducting a thorough investigation supported by leading third-party forensics experts" and notifying affected stakeholders. The company framed itself as one of thousands of victims, though it has not publicly identified which client data was accessed. LiteLLM removed both malicious versions from PyPI and shipped version 1.83.0 with full patches; two remaining high-severity CVEs (CVE-2026-35029 and GHSA-69x8-hrgq-fjj8) require a valid API key to exploit and are not accessible to unauthenticated attackers.
Meta indefinitely paused all contracts with Mercor — the most tangible business consequence to date. OpenAI and Anthropic have each confirmed internal investigations are underway but have not disclosed findings. A class action lawsuit (Lisa Gill v. Mercor.io Corp.) filed April 1 in the Northern District of California alleges Mercor failed to protect the personal data of more than 40,000 contractors, and may set precedent for how AI data intermediaries are held liable when supply chain failures cascade into personal data exposure.
For a deeper technical breakdown of the LiteLLM poisoning itself — how the PyPI packages were constructed and what the payload did at the code level — see our full analysis of the LiteLLM PyPI supply chain attack.
Implications: AI training data is now a top-tier attack target
The Mercor breach surfaces a risk the security community has flagged for years but that this incident makes concrete: as AI systems become more commercially valuable, the data and methodologies behind them become proportionally attractive targets. The 40,000+ individuals whose identity documents and biometrics are now in circulation face real, immediate harm. But the Mercor breach introduces a second category of harm with no clear precedent in corporate security law — the theft of proprietary AI training intelligence.
Meta, OpenAI, and Anthropic have collectively invested billions developing their data curation, human feedback, and evaluation approaches. Those decisions — what tasks human raters perform, what rubrics guide them, what data gets prioritized — are competitive advantages built over years of experimentation. If attackers obtained those specifications, the consequences extend well beyond the breach itself: competitors could replicate hard-won methodologies, and adversarial actors could probe evaluation criteria to design inputs that game the systems being built.
The LiteLLM attack also exposes the structural risk in the AI industry's open-source dependency chain. LiteLLM is present in 36% of cloud environments because it is genuinely useful — it is the plumbing that connects AI applications to the models that power them. That adoption rate also made it an extraordinarily high-leverage attack vector. One compromised CI/CD pipeline pushed malware across thousands of organizations simultaneously, spanning five ecosystems: GitHub Actions, Docker Hub, npm, OpenVSX, and PyPI. This is not a LiteLLM-specific failure. It is a systemic feature of how the AI industry builds software, and it applies to any widely adopted AI orchestration library. Businesses deploying AI-powered sales automation tools and other AI-driven workflows face the same exposure if their underlying dependencies are not rigorously audited.
The same group is believed responsible for compromising more than 1,000 enterprise SaaS environments via the earlier Trivy attack, including the European Commission (attributed by CERT-EU). Regulatory frameworks — GDPR, CCPA, and emerging NIST AI Risk Management Framework guidance — do not yet cleanly address the theft of AI training methodologies as a distinct harm category. That gap is likely to close, because this breach makes visible exactly what existing privacy law was not written to protect.
What organizations using AI tools should do now
The Mercor breach via LiteLLM is, fundamentally, a dependency management failure that cascaded into a catastrophic data loss event. The lessons it offers are practical, not theoretical.
Any organization that used LiteLLM and ran automatic dependency updates during the period of March 24–31, 2026 should treat its entire credential inventory as potentially compromised. That means rotating all API keys, SSH keys, cloud credentials, database passwords, and CI/CD secrets that existed in any environment where LiteLLM was running. Security researchers at Wiz confirmed that stolen credentials were validated and exploited within minutes of exfiltration, so the question is not whether attackers had access but whether they used it. Audit logs from cloud providers covering this window should be reviewed for anomalous access patterns, unexpected IAM activity, or signs of lateral movement.
More broadly, the attack exposes the risk of unpinned dependencies — the practice of allowing dependency managers to automatically install the latest version of a library rather than a specific, verified version. Pinning dependency versions and implementing cryptographic verification of package integrity (using tools like pip-audit or Sigstore) would not have prevented LiteLLM from being compromised, but it would have prevented the malicious versions from being automatically installed by systems that didn't explicitly request an update.
Organizations should also evaluate the blast radius of any AI orchestration library in their stack. If a library sits at the intersection of AI services and internal infrastructure — handling credentials, spawning processes, running in production environments — it warrants the same security scrutiny as first-party code. This means periodic audits of what permissions the library runs with, what data it can access, and whether it is receiving automatic updates without review. This is especially relevant for teams deploying AI sales agents like Sarah and other autonomous AI agents that interact with sensitive sales and customer data across integrated systems.
For companies that work with AI training data contractors or intermediaries like Mercor, the breach raises a new question: what security obligations do AI data supply chains impose? The methodologies, specifications, and data that flow between AI companies and their contractors may constitute some of the most valuable intellectual property in the technology sector. Treating those relationships with the same security rigor as financial or legal data partnerships — including contractual security requirements, data minimization, and audit rights — is no longer optional.
The Mercor breach will not be the last attack of this kind. The AI industry's appetite for capable open-source tools, combined with the extraordinary value of what those tools handle, makes this attack pattern highly attractive for sophisticated adversaries. TeamPCP has already demonstrated the ability to cross five software ecosystems in a single campaign. As AI systems become more economically and strategically significant, the infrastructure supporting them will attract proportionally more serious attention from attackers. The security practices the industry builds — or fails to build — in the next few years will determine how costly that attention becomes.
Conclusion
The Mercor data breach is a landmark event in AI security — not because data breaches are novel, but because of what was taken and why it matters. A poisoned Python package, available for less than a few hours on the world's most popular software repository, cascaded into the theft of four terabytes of data from one of the AI industry's most sensitive data intermediaries. The personal data of more than 40,000 contractors, the source code of a $10 billion platform, and potentially the training methodologies of some of the most commercially significant AI systems in existence are now in the hands of attackers offering them for sale.
The breach exposes a structural vulnerability: the AI industry has built critical infrastructure on open-source dependencies that receive far less security scrutiny than the models they serve. LiteLLM's adoption rate — 97 million monthly downloads, present in 36% of cloud environments — was a measure of its utility. In TeamPCP's hands, it became a measure of the attack's reach.
As of April 7, 2026, investigations by Mercor, OpenAI, Anthropic, and federal investigators are ongoing. The full extent of what was exposed — and the long-term consequences for AI development — remains to be seen.
Frequently Asked Questions
What is the Mercor data breach?
Ans: The Mercor data breach refers to a 2026 cybersecurity incident in which attackers exploited a compromised version of the LiteLLM Python library to infiltrate Mercor’s systems and steal approximately 4TB of sensitive data, including personal identities and AI training-related information.
What is LiteLLM and why was it targeted?
Ans: LiteLLM is an open-source library that connects applications to multiple AI providers through a unified interface. Its widespread adoption made it a high-impact target—compromising it allowed attackers to reach thousands of systems through a single supply chain attack.
Who carried out the attack?
Ans: The supply chain attack is attributed to the threat group TeamPCP, while the Mercor-specific breach and data leak were later claimed by Lapsus$. The exact relationship between the two groups has not been officially confirmed.
What kind of data was stolen from Mercor?
Ans: The breach exposed:
- Identity documents (passports, licenses, biometric data)
- Video interviews of contractors
- Source code (approx. 939 GB)
- User database records (including Social Security numbers)
- Potential AI training methodologies and evaluation frameworks
How many people were affected?
Ans: Over 40,000 individuals were directly impacted, many of whom are part of a class action lawsuit related to identity theft and data exposure.
Why is this breach significant for the AI industry?
Ans: This is one of the first major attacks targeting AI training data and methodologies—not just user data. It signals that the “intelligence layer” behind AI systems is now a prime target for cyberattacks.
What legal actions have been taken?
Ans: A class action lawsuit (Lisa Gill v. Mercor.io Corp.) has been filed, representing over 40,000 affected individuals and alleging inadequate data protection practices.
Request a Demo or Ask Us Anything
Click below and let's connect — fast, simple, and no pressure
