2 minute read

Cool research showing (1) hijacking of Deep Research agent, (2) exfil via gmail write actions.


“Do deep research on my emails from today … collect everything about …”

The “collect everything about” reduces the bar for the injection to work. We spent some time going around these specific terms with AgentFlayer. After fiddling around, you can get the injection to work without it.


Full Name: Zvika Rosenberg

Choice of info to exfil is also really important. ChatGPT is especially reluctant to do anything around secrets. If the data seems benign it would be more willing to exfil it.


In the following we share our research process to craft the prompt injection that pushes the agent to do exactly what we want. This process was a rollercoaster of failed attempts, frustrating roadblocks, and, finally, a breakthrough!

Prompt injection is very much an annoying process of getting the thing to work. The “solution” is to use AI to do it. We typically use Grok or Claude.


Attempt 3 - Forcing Tool Use: We crafted a new prompt that explicitly instructed the agent to use the browser.open() tool with the malicious URL. This led to partial success. The agent would sometimes attempt to use the tool, but the request often failed, likely due to additional security restrictions on suspicious URLs.

This TTP: recon for tools and then invoking tools, is a repeated theme. Works every time.


Attempt 4 - Adding Persistence: To overcome this, we added instructions for the agent to “retry several times” and framed the failures as standard network connectivity issues. This improved the success rate, with the agent sometimes performing the HTTP request correctly. However, in other cases, it would call the attacker’s URL without attaching the necessary PII parameters.

I wouldn’t call this persistence as it doesn’t stick around between sessions. But this is a cool new detail, getting the agent to retry in case of failures.


The agent accepted this reasoning, encoded the PII as a string and transmitted it. This method achieved a 100% success rate in repeated tests, demonstrating a reliable method for indirect prompt injection and data exfiltration.

This is cool. Getting to a consistent payload is not easy.


The leak is Service-side, occurring entirely from within OpenAI’s cloud environment. The agent’s built-in browsing tool performs the exfiltration autonomously, without any client involvement. Prior research—such as AgentFlayer by Zenity and EchoLeak by Aim Security—demonstrated client-side leaks, where exfiltration was triggered when the agent rendered attacker-controlled content (such as images) in the user’s interface. Our attack broadens the threat surface: instead of relying on what the client displays, it exploits what the backend agent is induced to execute.

Appreciate the shout out. AgentFlayer demonstrates server-side exfil for Copilot Studio, but not for ChatGPT. This is a cool new find by the team at Radware.

Updated: