Embrace The Red

Microsoft Copilot: From Prompt Injection to Exfiltration of Personal Information

Embrace The Red

1 year 8 months ago

This post describes vulnerability in Microsoft 365 Copilot that allowed the theft of a user’s emails and other personal information. This vulnerability warrants a deep dive, because it combines a variety of novel attack techniques that are not even two years old.

I initially disclosed parts of this exploit to Microsoft in January, and then the full exploit chain in February 2024. A few days ago I got the okay from MSRC to disclose this report.

Google AI Studio: LLM-Powered Data Exfiltration Hits Again! Quickly Fixed.

Embrace The Red

1 year 8 months ago

Recently, I found what appeared to be a regression or bypass that again allowed data exfiltration via image rendering during prompt injection. See the previous post here for reference.

Data Exfiltration via Rendering HTML Image Tags

During re-testing, I had sporadic success with markdown rendering tricks, but eventually, I was able to drastically simplify the exploit by asking directly for an HTML image tag.

This behavior might actually have existed all along, as Google AI Studio hadn’t yet implemented any kind of Content Security Policy to prevent communication with arbitrary domains using images.

Protect Your Copilots: Preventing Data Leaks in Copilot Studio

Embrace The Red

1 year 9 months ago

Microsoft’s Copilot Studio is a powerful, easy-to-use, low-code platform that enables employees in an organization to create chatbots. Previously known as Power Virtual Agents, it has been updated (including GenAI features) and rebranded to Copilot Studio, likely to align with current AI trends.

This post discusses security risks to be aware of when using Copilot Studio, focusing on data leaks, unauthorized access, and how external adversaries can find and interact with misconfigured Copilots. Learn about security controls, like enabling Data Loss Prevention (DLP), which is currently off by default, to protect your organization’s data.

Google Colab AI: Data Leakage Through Image Rendering Fixed. Some Risks Remain.

Embrace The Red

1 year 9 months ago

Google Colab AI, now just called Gemini in Colab, was vulnerable to data leakage via image rendering.

This is an older bug report, dating back to November 29, 2023. However, recent events prompted me to write this up:

Google did not reward this finding, and
Colab now automatically puts Notebook content (untrusted data) into the prompt.

Let’s explore the specifics.

Google Colab AI - Revealing the System Prompt

At the end of November last year, I noticed that there was a “Colab AI” feature, which integrated an LLM to chat with and write code. Naturally, I grabbed the system prompt, and it contained instructions that begged the LLM to not render images.

Breaking Instruction Hierarchy in OpenAI's gpt-4o-mini

Embrace The Red

1 year 9 months ago

Recently, OpenAI announced gpt-4o-mini and there are some interesting updates, including safety improvements regarding “Instruction Hierarchy”:

OpenAI puts this in the light of “safety”, the word security is not mentioned in the announcement.

Additionally, this The Verge article titled “OpenAI’s latest model will block the ‘ignore all previous instructions’ loophole” created interesting discussions on X, including a first demo bypass.

I spent some time this weekend to get a better intuition about gpt-4o-mini model and instruction hierarchy, and the conclusion is that system instructions are still not a security boundary.

Sorry, ChatGPT Is Under Maintenance: Persistent Denial of Service through Prompt Injection and Memory Attacks

Embrace The Red

1 year 9 months ago

Imagine you visit a website with ChatGPT, and suddenly, it stops working entirely!

In this post we show how an attacker can use prompt injection to cause a persistent denial of service that lasts across chat sessions for a user.

Hacking Memories

Previously we discussed how ChatGPT is vulnerable to automatic tool invocation of the memory tool. This can be used by an attacker during prompt injection to ingest malicious or fake memories into your ChatGPT.

GitHub Copilot Chat: From Prompt Injection to Data Exfiltration

Embrace The Red

1 year 10 months ago

This post highlights how the GitHub Copilot Chat VS Code Extension was vulnerable to data exfiltration via prompt injection when analyzing untrusted source code.

GitHub Copilot Chat

GitHub Copilot Chat is a VS Code Extension that allows a user to chat with source code, refactor code, get info about terminal output, or general help about VS Code, and things along those lines.

It does so by sending source code, along with the user’s questions to a large language model (LLM). A bit of a segue, but if you are curious, here are its system instructions, highlighting some interesting prompting strategies and that it is powered by GPT-4:

Automatic Tool Invocation when Browsing with ChatGPT - Threats and Mitigations

Embrace The Red

1 year 11 months ago

In the previous post we demonstrated how instructions embedded in untrusted data can invoke ChatGPT’s memory tool. The examples we looked at included Uploaded Files, Connected Apps and also the Browsing tool.

When it came to the browsing tool we observed that mitigations were put in place and older demo exploits did not work anymore. After chatting with other security researchers, I learned that they had observed the same.

ChatGPT: Hacking Memories with Prompt Injection

Embrace The Red

1 year 11 months ago

OpenAI recently introduced a memory feature in ChatGPT, enabling it to recall information across sessions, creating a more personalized user experience.

However, with this new capability comes risks. Imagine if an attacker could manipulate your AI assistant (chatbot or agent) to remember false information, bias or even instructions, or delete all your memories! This is not a futuristic scenario, the attack that makes this possible is called Indirect Prompt Injection.

Machine Learning Attack Series: Backdooring Keras Models and How to Detect It

Embrace The Red

1 year 11 months ago

This post is part of a series about machine learning and artificial intelligence.

Adversaries often leverage supply chain attacks to gain footholds. In machine learning model deserialization issues are a significant threat, and detecting them is crucial, as they can lead to arbitrary code execution. We explored this attack with Python Pickle files in the past.

In this post we are covering backdooring the original Keras Husky AI model from the Machine Learning Attack Series, and afterwards we investigate tooling to detect the backdoor.

Pivot to the Clouds: Cookie Theft in 2024

Embrace The Red

1 year 11 months ago

Recently Google published a blog about detecting browser data theft using Windows Event Logs.

There are some good points in the post for defenders on how to detect misuse of DPAPI calls attempting to grab sensitive browser data.

But, what about the Remote Debugging feature?

This made me curious to revisit the state of the remote debugging feature of browsers for grabbing sensitive information, including cookies.

We discussed cookie theft techniques in the past, even presented about it at the CCC some 5+ years ago and helped add the TTP to the MITRE ATT&CK matrix.

Bobby Tables but with LLM Apps - Google NotebookLM Data Exfiltration

Embrace The Red

2 years ago

Google’s NotebookLM is an experimental project that was released last year. It allows users to upload files and analyze them with a large language model (LLM).

However, it is vulnerable to Prompt Injection, meaning that uploaded files can manipulate the chat conversation and control what the user sees in responses.

There is currently no known solution to these kinds of attacks, so users can’t implicitly trust responses from large language model applications when untrusted data is involved. Additionally though NotebookLM is also vulnerable to data exfiltration when processing untrusted data.

HackSpaceCon 2024: Short Trip Report, Slides and Rocket Launch

Embrace The Red

2 years ago

This week was HackSpaceCon 2024. It was the first time I attended and it was fantastic.

The conference was at the Kennedy Space Center! Yes, right there and the swag and talks matched the world class location.

The keynote “Buckle up! Let’s make the world a safer place” was by Dave Kennedy, who provided great insights on attacker strategies of the past and present, the importance of active threat hunting and challenges ahead. A great specific example he gave was how simple modifications to off-the-shelf malware (still) go entirely under the radar.

Google AI Studio Data Exfiltration via Prompt Injection - Possible Regression and Fix

Embrace The Red

2 years ago

What I like about the rapid advancements and excitement about AI over the last few years is that we see a resurgence of the testing discipline!

Software testing is hard, and adding AI to the mix does not make it easier at all!

Google AI Studio - Initially not vulnerable to data leakage via image rendering

When Google released AI Studio last year I checked for the common image markdown data exfiltration vulnerability and it was not vulnerable.

The dangers of AI agents unfurling hyperlinks and what to do about it

Embrace The Red

2 years 1 month ago

About a year ago we talked about how developers can’t intrinsically trust LLM responses and common threats that AI Chatbots face and how attackers can exploit them, including ways to exfiltrate data.

One of the threats is unfurling of hyperlinks, which can lead to data exfiltration and is something often seen in Chatbots. So, let’s shine more light on it, including practical guidance on how to mitigate it with the example of Slack Apps.

ASCII Smuggler - Improvements

Embrace The Red

2 years 2 months ago

I added a couple of features and improvements to ASCII Smuggler, including:

Optional rendering of the BEGIN and END Unicode Tags when crafting hidden text
Added a feature to URL decode the input before checking for hidden text
Output Modes for Decoding: Switch between highlighting the hidden text amongst the regular content, or only showing the hidden text in the output
The selected options are remembered now (using local storage)
Updated the UI to make it look nicer (e.g bigger fonts), and it works better on mobile now

The tool is here.

Who Am I? Conditional Prompt Injection Attacks with Microsoft Copilot

Embrace The Red

2 years 2 months ago

Building reliable prompt injection payloads is challenging at times. It’s this new world with large language model (LLM) applications that can be instructed with natural language and they mostly follow instructions… but not always.

Attackers have the same challenges around prompt engineering as normal users.

Prompt Injection Exploit Development

Attacks always get better over time. And as more features are being added to LLM applications, the degrees of freedom for attackers increases as well.

Google Gemini: Planting Instructions For Delayed Automatic Tool Invocation

Embrace The Red

2 years 2 months ago

Last November, while testing Google Bard (now called Gemini) for vulnerabilities, I had a couple of interesting observations when it comes to automatic tool invocation.

Confused Deputy - Automatic Tool Invocation

First, what do I mean by this… “automatic tool invocation”…

Consider the following scenario: An attacker sends a malicious email to a user containing instructions to call an external tool. Google named these tools Extensions.

When the user analyzes the email with an LLM, it interprets the instructions and calls the external tool, leading to a kind of request forgery or maybe better called automatic tool invocation.

ChatGPT: Lack of Isolation between Code Interpreter sessions of GPTs

Embrace The Red

2 years 2 months ago

Your Code Interpreter sandbox, also known as Advanced Data Analysis sessions, are shared between private and public GPTs. Yes, your actual compute container and its storage is shared. Each user gets their own isolated container, but if a user uses multiple GPTs and stores files in Code Interpreter all GPTs can access (and also overwrite) each others files.

This is true also for files uploaded/created with private GPTs and ChatGPT itself.

Video: ASCII Smuggling and Hidden Prompt Instructions

Embrace The Red

2 years 2 months ago

A couple of weeks ago hidden prompt injections were discovered and we covered it at the time.

This video explains it in more detail, and also highlights implications beyond hiding instructions, including what I call ASCII Smuggling. This is the usage of Unicode Tags Block characters to both craft and deciper hidden messages in plain sight.

Checked

2 hours 28 minutes ago

Recent content on Embrace The Red

URL

https://embracethered.com/blog/

Embrace The Red feed