Recent Posts

First Public Confirmation of Threat Actors Targeting AI Systems

January 11, 2026

Over the past year I’ve been asking people the same question over and over again: when our AI systems are targeted, will you know?

Answers vary. Mostly in elaboration of compensating controls. But the bottom line is almost always the same–No. Some even go the extra mile and say that AI security threats are all fruits of red team imagination.

On the offensive side, AI red teamers are having a ball. Ask your friendly AI hacker and they will all tell you, it feels like the 90s again. From our own RT perspective, there isn’t a single AI system we’ve observed and weren’t able to compromise within hours.

It's the 90s again

Enterprise security teams have been seeing the other side of this: massive risk taking. The hype-tweet-to-enterprise-deployment pipeline has never been shorter. Sama posts about the latest AI thingy (agentic browers, coding assistants, …) and C-level execs ask how fast can we adopt it. The gold rush is in full swing.

We have massive risk taking throughout the industry. With bleeding edge tech that is so vulnerable that (good) hackers are feeling like we’ve digressed to the era of SQL injection everywhere. So where are the massive new headlines of devastating breaches?

Joshua Saxe called this the AI risk overhang, accepting the narrative that attackers aren’t there yet. So, asking that question again: When our AI systems are targeted, will you know? Of course not. Most aren’t even looking.

One major thing here is that AI system breaches can still be hidden away from public view. We’ve observed first hand attackers poking around at AI systems. People share stories in private forums. But there isn’t yet a publicly confirmed incident.

Or there wasn’t–until now. A few days ago DefusedCyber observed “an actor actively trying to access various LLM pathways, querying multiple different honeypot types for OpenAI, Gemini & Claude endpoints”.

DefusedCyber post

A day after, boB Rudis at GrayNoise reported on similar activity:

Starting December 28, 2025, two IPs launched a methodical probe of 73+ LLM model endpoints. In eleven days, they generated 80,469 sessions—systematic reconnaissance hunting for misconfigured proxy servers that might leak access to commercial APIs.

The attack tested both OpenAI-compatible API formats and Google Gemini formats. Every major model family appeared in the probe list:

  • OpenAI (GPT-4o and variants)
  • Anthropic (Claude Sonnet, Opus, Haiku)
  • Meta (Llama 3.x)
  • DeepSeek (DeepSeek-R1)
  • Google (Gemini)
  • Mistral
  • Alibaba (Qwen)
  • xAI (Grok)

But they got more than that. These two IPs were previously observed exploiting known CVEs. So we know these aren’t “good” researchers. These are actors actively trying to exploit exposed vulnerable endpoints. Exploitation attempts included React2Shell, which to me (together with the noisy nature of these scans) suggests an opportunistic and financially motivated actor (i.e. cybercrime). Here’s boB’s assessment:

Assessment: Professional threat actor conducting reconnaissance. The infrastructure overlap with established CVE scanning operations suggests this enumeration feeds into a larger exploitation pipeline. They’re building target lists. … Eighty thousand enumeration requests represent investment. Threat actors don’t map infrastructure at this scale without plans to use that map. If you’re running exposed LLM endpoints, you’re likely already on someone’s list.

This is the first public confirmation of a threat actor targeting AI systems. Huge find by DefusedCyber and boB @ GrayNoise. This changes the calculus. We now have all three factors for a big mess:

  1. Rapidly expanding AI attack surface - the enterprise AI gold rush
  2. Fundamental exploitability of AI systems - applications are vulnerable when they have an exploitable bug; agents are exploitable
  3. Threat actors actively search for exposed AI systems (1) to exploit (2)

What to do next? First, we need to update our world view. And I need to update my question. It’s no longer “when our AI systems are targeted, will you know?”. If you have a publicly exposed AI system and your systems were not alerted, the answer to that has proven to be No.

The question to ask ourselves and our orgs now is: “Our AI systems are actively targeted by threat actors. Do we know which of is exposed? which has already been breached?”

P.S Learning From The Threat Actor’s Choice of Prompts

LLM literacy by the Threat Actor

Once a threat actor finds an exploitable AI system, what will they do with it? How LLM literate are they?

Let’s start with the second question. Look at the prompts used by the threat actor to ping the AI systems they found:

Test queries performed by the threat actor, GrayNoise

Asking “What model are you” is a rather straightforward way to figure out if you’re talking to a state of the art model or something running in somebody’s basement. But the last query is most revealing: “How many letter r are in the word strawberry?”. This query was all the rage on social media before the launch of OpenAI’s o1 model, that created the vibe shift into focusing on reasoning models. It’s an effective litmus-test to verify that the model you’re talking it is close to SOTA. This is very important, because ~SOTA models are more expensive and more powerful.

Crucially, this shows that the threat actor is AI literate. At least in prompt engineering, which is the same skill you need for prompt injection.

What Can the Threat Actor do With Discovered AI Systems?

If you want to use LLMs for malicious operations, using one through stolen access is a great way to avoid detection. With bonus points for letting someone else pick up the bill.

But if those systems have access to enterprise data. Or enterprise credentials. Or worse–they can make business decisions. Said differently, if these AI systems are AI agents. Well then.

Tags: , ,

Make Real Progress In Security From AI

October 08, 2025

I gave a talk at the AI Agent Security Summit by Zenity Labs on October 8th in San Francisco. I’ll post a blog version of that talk here shortly.

But for now, here are: My slides.

Links and references:

Tags: , , ,

How Should AI Ask for Our Input?

August 28, 2025

Enterprise systems provide a terrible user experience. That’s common knowledge. Check out one of the flash keynotes about the latest flagship AI product by big incumbents. Look behind the fancy agent, what do you see? You’ll likely find a form-based system with strong early 2000s vibes. But don’t laugh, yet. We’re no better.

There’s a common formula for cybersecurity user experience. A nice useless dashboard as eye-candy, an inventory, list(s) of risks, knobs and whistles for configs. When Wiz came out a few years ago breaking the formula with their graph-centric UX, people welcomed the change. Wiz popularized graphs and toxic combinations of risk. They came out with a simple and intuitive UX. Graphs are part of the common formula now (ty Wiz).

The issue isn’t modern look-and-feel. You can find the common formula applied with the latest hottest UI framework if you wish, just go to your nearest startup. It’s that cybersecurity is complex. You can try to hide complexity away, to provide templates, to achieve the holy “turn-key solution”. But then you sell to a F50 and discover 20 quirky regulations of regional community banks vs. national banks, or dual-regulated entities. Besides, your product expands. You end up trying to cater your turn-key solution to hundreds of different diverging views. So the median user who’s got one or two use cases in mind must filter out the noise.

Wiz is still highly regarded, but their UX is far from simple nowadays. Just look at that side menu. Enterprise UX is complex because enterprises are complex and cybersecurity is complex.

But we’ve got AI now.

Not those pesky right-panel copilots. What Omer Vexler is doing above is very cool. He interweaves usage with development. If devs can use Claude Code to vibe-code their product’s UX, let’s go all in, and let customers do it directly.

Want a new report? Here you go. Table missing a column? Not anymore. You’ve never used 90% of the views? Hide them away. Let every user see only what they care about and nothing more. Let them vibe-code your UX.

Can we expect customers to know what they want and to vibe-code correctly? I don’t think so, but do we have to? TikTok figures out who you are based on profiling your attention, via a very natural signal of you scrolling thru videos. We can build AI agents that infer what users need right now even without them asking (p.s. remember privacy?).

Maybe we could finally have a great user experience that stays great for you even as products evolve for the needs of others.

But. Do we even need a user experience anymore?

The reason why we have dashboards and lists and graphs is for us humans to reason about complex data. To manage a complex process. AI doesn’t need any of that. It just eats up raw, messy, beautiful data.

What interface do humans need when AI performs the analysis, handles the process, manages the program, and asks us for direction?

We might need an interface to review AI’s work. But there’s a big difference between an interface for creation and one for review. Think code review software (PRs) vs. IDEs.

I asked this question to a very smart friend. He thought about it for a while. Then he reversed the roles and asked: what interface does AI need to ask the human for input?

We’re no longer designing user experiences. We’re designing a machine-human interface.

Tags: , , ,

Pwn the Enterprise - thank you AI! Slides, Demos and Techniques

August 08, 2025

We’re getting asks for more info about the 0click AI exploits we dropped this week at DEFCON / BHUSA. We gave a talk at BlackHat, but it’ll take time before the videos are out. I’m sharing what I’ve got written up. A sneak peek that I shared with folks last week as a pre-briefing. And the slides.

AI Enterprise Compromise - 0click Exploit Methods sneak peek:

Last year at our Black Hat USA talk Living off Microsoft Copilot, we showed how easily a remote attacker can use AI assistants as a vector to compromise enterprise users. A year later, things have changed. For the worse. We’ve got agents now! They can act! Meaning we get much more damage than before. Agents are also integrated with more enterprise data creating new attack path for a hacker to get in, adding fuel to the fire.

In the talk we’ll examine how different AI Assistants and Agents try and fail to mitigate security risks. We explain the difference between soft and hard boundaries, and will cover mitigations that actually work. Along the way, we will show full attack chains from an external attacker to full compromise on every major AI assistant and agent platform. Some are 1clicks where the user has to perform one ill-advised action like click a link. Others are 0click where the user has nothing tangible they can do to protect themselves.

This is the first time we see full 0click compromise of ChatGPT, Copilot Studio, Cursor and Salesforce Einstein. We also show new results on Gemini and Microsoft Copilot. The main point of the talk is not just the attacks, but rather defense. We’re thinking about this problem all wrong (believing AI will solve it), and we need to change course to make any meaningful progress.

Slides.

ChatGPT:

  • Attacker capability: An attacker can target any user, they only need to know their email address. The attacker gains full control over the victim’s ChatGPT for the current and any future conversation. They gain access to Google Drive on behalf of the user. They change ChatGPT’s goal to be one that is detrimental to the user (downloading malware, making a bad business/personal decision).
  • Attack type: 0click. A layperson has no way to protect themselves.
  • Who is vulnerable? Anyone using ChatGPT with the Google Drive connector
  • Status: fixed (injection we used no longer works) and awarded $1111 bounty

Demos:

[video] ChatGPT is hijacked to search the user’s connected Google Drive for API keys and exfiltrate them back to the attacker via a transparent payload-carrying pixel.

[video] Memory implant causes ChatGPT to recommend a malicious library to the victim when they ask for a code snippet.

[video] Memory implant causes ChatGPT to persuade the victim to do a foolish action (by twitter).

Copilot Studio:

  • Attacker capability: An attacker can use OSINT to find Copilot Studio agents on the Internet (we found >3.5K of them with powerpwn).They target the agents, get them to reveal their knowledge and tools, dump all their data, and leverage their tools for malicious purposes.
  • Attack type: 0click.
  • Who is vulnerable? Copilot Studio agents that engage with the Internet (including email)
  • Status: fixed (injection we used no longer works) and awarded $8000 bounty

Demos:

[xitter thread with videos] Microsoft released an example use case of how Mckinsey & Co leverages Copilot Studio for customer service. An attacker hijacks the agent to exfiltrate all information available to it - including the Company’s entire CRM.

Cursor + Jira MCP:

  • Attacker capability: An attacker can use OSINT to find email boxes that automatically open Jira tickets (we found hundreds of them with Google Dorking). They use them to create a malicious Jira ticket. When a developer points Cursor to search for Jira tickets, the Cursor agent is hijacked by the attacker. Cursor then continues to harvest credentials from the developer machine and send them out to the attacker.
  • Attack type: 0click.
  • Who is vulnerable? Any developer that uses Cursor with the Jira MCP server
  • Status: ticket closed

Cursor’s response:

This is a known issue. MCP servers, especially ones that connect to untrusted data sources, present a serious risk to users. We always recommend users review each MCP server before installation and limit to those that access trusted content. We also recommend using features such as. cursorignore to limit the possible exfiltration vectors for sensitive information stored in a repository.

Demos:

[xitter thread with videos] Attacker submits support tickets to trigger an automation that created Jira ticket. Developer points Cursor at the weaponized ticket without realizing its original. Cursor is hijacked by a weaponized Jira ticket to harvest and exfiltrate developer secret keys.

Salesforce Einstein:

  • Attacker capability: An attacker can use OSINT to find web-to-case automation (we found hundreds of them with Google Dorking). They use these to create malicious cases on the victim’s Salesforce instance. Once a sales rep uses Einstein to look at relevant cases, their session is hijacked by the attacker. The attacker uses it to update all Contact emails. The effect is that the attacker reroutes all customer communication through their Man-in-the-Middle (MITM) email server
  • Attack type: 0click.
  • Who is vulnerable? Users of Salesforce Einstein who enabled an action from the asset library
  • Status: ticket closed (it’s been >90 days, see slides for disclosure timeline)

Salesforce’s response:

“Thank you for your report. We have reviewed the reported finding. Please be informed that our engineering team is already aware of the reported finding and they are working to fix it. Please be aware that Salesforce Security does not provide timelines for the fix. Salesforce will fix any security findings based on our internal severity rating and remediation guidelines. The Salesforce Security team is closing this case if you don’t have additional questions.

Demos:

[xitter thread with videos] Attacker finds online a web-to-case form. They inject malicious cases to booby trap questions about open cases. Once a victim steps on the trap, Einstein is hijacked. The attacker updates all contact records to an email address of their choosing.

Google Gemini:

The gist: the attacks we demonstrated last year on Microsoft Copilot work today on Gemini.

  • Attacker capability: An attacker can use email or calendar to send a malicious message to a user. They booby trap any questions they like. For example “summarize my email” or “whats on my calendar”. Once asked, Gemini is hijacked by the attacker. The attacker controls Gemini’s behavior and the information it provides to the user. They can use it to give the user bad information at a crucial time, or social engineer the user with Gemini as an insider.
  • Attack type: 1click. The user is the one making a bad action. Gemini acts as a malicious insider pushing them to do so.
  • Who is vulnerable? Every Gemini user.
  • Status: ticket closed (it’s been >90 days)

Demos:

[video] An attacker booby traps the prompt “summarize by email” by sending an email to the victim. Once the victim asks a similar question, Gemini becomes a malicious insider. Gemini proceeds to social engineer the user to click on a phishing link.

[video] An attacker makes Gemini provide the wrong financial information when prompted by the victim. When the victim asks for routing details for one of their vendors, they receive those of the attacker instead.

Microsoft Copilot:

The gist: the attacks we demonstrated last year on Microsoft Copilot still work today.

Copilot’s capabilities and status are exactly those of Gemini. We’re mainly going to show that the same attacker from last year still work. This time – for diversity – we attack through calendar rather than email.

Demos:

[video] By sending a simple email message from an external account, without the user interacting with that email, an attacker can hijack Microsoft Copilot to send the user a phishing link in response to the common query “summarize my emails”.

Tags: , , ,

Someone Is Cleaning Up Evidence

July 26, 2025

AWS security blog confirms the attacker gained access to a write token and abused it to inject the malicious prompt. This confirms our earlier findings.

In fact, this token gave the attacked write access to AWS Toolkit, IDE Extension and Amazon Q.

The blog also details that the attacker gained access by exploiting a vulnerability in the CodeBuild and using memory dump to grab the tokens. That confirms our suspicion.

A key question remains – how did the attacker compromise this token?

Evidence are getting deleted fast

Our earlier findings were based on analysis of GH Archive and the Github user lkmanka58. GH Archive gives us commit SHAs. Github never forgets SHAs. So we can always looks at the commit’s code even if the branch or tag gets deleted. In our case, this was instrumental to find and analyze (1) the stability tag where the attacker hid the prompt payload, (2) lkmanka58’s prior activity.

On that second point:

Since the user lkmanka58 is now delete along with their repos, we can no long look at the code of this repo. Fortunately, I looked at it yesterday before it got deleted. On June 13th lkmanka58 created a repo lkmanka58/code_whisperer playing around with aws-actions/configure-aws-credentials@v4 trying to assume role arn:aws:iam::975050122078:role/code_whisperer.

GH Archive reveals three push events to lkmanka58's now-deleted repository

Sadly there were no deleted PRs in June 2025.

Tags: , , , ,