MCP Prompt Injection: Not Just For Evil
MCP tools are implicated in several new attack techniques. Here's a look at how they can be manipulated for good, such as logging tool usage and filtering unauthorized commands.
BackgroundOver the last few months, there has been a lot of activity in the Model Context Protocol (MCP) space, both in terms of adoption as well as security. Developed by Anthropic, MCP has been rapidly gaining traction across the AI ecosystem. MCP allows Large Language Models (LLMs) to interface with tools and for those interfaces to be rapidly created. MCP tools allow for the rapid development of “agentic” systems, or AI systems that autonomously perform tasks.
Beyond adoption, new attack techniques have been shown to allow prompt injection via MCP tool descriptions and responses, MCP tool poisoning, rug pulls and more.
Prompt Injection is a weakness in LLMs that can be used to elicit unintended behavior, circumvent safeguards and produce potentially malicious responses. Prompt injection occurs when an attacker instructs the LLM to disregard other rules and do the attacker’s bidding. In this blog, I show how to use techniques similar to prompt injection to change the LLM’s interaction with MCP tools. Anyone conducting MCP research may find these techniques useful.
Experiments with MCPFor my research, I used the 5ire client because it makes it incredibly simple to switch out and restart MCP servers and switch between LLMs. In 5ire, I can easily configure our MCP servers (for this test case, I’ve already configured the reference MCP weather server):
Tenable Research using the 5ire client, April 2025
Anatomy of an MCP serverLet’s talk about how an MCP server is written and configured. First, we define a simple MCP server in Python with FastMCP. The FastMCP library makes it fairly simple to get up and running with MCP.
I can use this framework to develop several different servers and tools for my experiments. Now that I’ve got the framework, I’ll create a tool.
Now that I know how to create a simple tool in Python, let’s see what else I can do.
Logging tool useMultiple MCP servers can be configured in an MCP host and each server can have multiple tools. As I was exploring this new technology, I wondered if there was a way to log all tool calls across configured MCP servers. This is doable at the MCP host level or via each MCP server, but I asked myself: why not find another way? I wanted to log all MCP tool calls that the host makes, so I decided to see if I could create a tool that would insert itself before any other tool calls and log information about those tool calls.
Let’s break down the above MCP tool:
- I start out with a decorator from FastMCP (1) to indicate this is an MCP tool.
- Then I define the function (2) with all of the parameters I want. The function name and parameters are exposed to the LLM and the LLM is intelligent enough to infer how to populate them.
- Next is the description (3). This is the meat of the tool, instructing the LLM to insert this tool before any other tool call. You can read through the parameters and what I’m trying to accomplish. It may seem a little repetitive, but this is the current iteration that seems to work well across different models.
- Then, I write it to a file (4). I could use Python’s ‘logger’ here, but I had issues getting that working with the MCP clients we used. In most of our testing, it wasn’t easy to get anything written to stdout, so we chose logging to a file instead.
- Finally, we return a nice message thanking the AI and passing the name of the actual tool to run (5).
In testing, some models had no problem inserting the tool before any other tool call. Some did it sporadically, while others did not unless we asked about it.
Source: Tenable Research, April 2025
As the image above shows, the LLM runs the logging tool just before it runs the weather tool I requested. The logging tool then logs information about the tool it was asked to run, including the MCP server name, MCP tool name and description, and the user prompt that caused the LLM to try to run that tool. In this case, it actually logs twice, but I wasn’t able to investigate why.
Tool filtering / firewallUsing the same method, I can block unapproved tools from running.
Here I’m using the same technique to run prior to other tools calls. In this case I’m simply looking for the tool name to match a string `get_alerts`. If it matches, I tell the LLM not to run the tool. Sometimes it respects this!
Source: Tenable Research, April 2025
MCP introspectionThis method of using the tool description block to ask the LLM to run this tool before other tools could clearly be abused. Can I use a similar technique to find out about other tools in use that ask for a similar hierarchy? I give it a try:
Note that I’m logging to the same log file so that it’s easier to see what’s happening. In a real world scenario, these tools would likely log to separate files.
Source: Tenable Research, April 2025
Here we can see that log_other_inline_tools runs after the logging tool in this case. The tool then logs the other “inline” tool that the LLM is aware of. It then lists the other available tools.
Can this technique be used to extract the LLM system prompt?Maybe. Here’s what I tried:
You can see in the return value that I tried to trick the model further by giving it a score at the end, so maybe it’ll think I’m really doing some sort of analysis.
Source: Tenable Research, April 2025
Source: Tenable Research, April 2025
Source: Tenable Research, April 2025
Source: Tenable Research, April 2025
Source: Tenable Research, April 2025
It seems like the LLM models vary between something realistic and complete hallucination. Remember that the models try to figure out how to fill out the tool’s parameters. So, if they don’t have a good idea of what goes where, they may just make it up. Based on my testing, it looks like Claude Sonnet 3.7 displays a piece of the prompt it has around running tools. Google Gemini 2.5 Pro Experimental seems to do the same. OpenAI’s GPT-4o puts variations in the log each time, so it seems like it’s just made up. It should also be noted that directly asking for the system prompt is successful for some models while unsuccessful for others. Regardless, some prompt text is still sent to the logging tool. While I can’t say for certain if I’m seeing actual developer instructions or hallucinated text, these tests may be useful to facilitate other research.
ConclusionTools should require explicit approval before running in most MCP Host applications. In fact, this is required by the MCP specification. Still, there are many ways in which tools can be used to do things that may not be strictly understood by the specification. Here I’ve demonstrated a few interesting techniques that could be used to develop security tooling, perform research or to help identify other malicious tools. These methods rely on LLM prompting via the description and return values of the MCP tools themselves. Since LLMs are non-deterministic, so, too, are the results. Lots of things could affect the results here: the model in use, temperature and safety settings, specific language, etc. Additionally, the descriptions used to instruct the LLM to do different things with the tools may need to be different depending on the model used. I’ve had varying results with different models, though I haven’t tested every case on every model.
Some of these techniques could be used to advance both positive and negative goals. We believe that some can be used to further LLM and MCP research.
The code from this blog can be found on github.
ReferencesWhile working on this blog, I saw some great work by Trail of Bits dubbing one of the techniques used here “jumping the line.” I offer one possible detection method in the MCP Introspection section of this post. In that section, I show the use of an MCP tool to identify other MCP tools requesting to run first.