Jul 9, 2025

Paresh Bhaya
Model Context Protocol (MCP) is currently a prominent topic in the technology news cycle. As with any new protocol or technology, there is an understandable excitement surrounding its potential, which can sometimes lead to a neglect of inherent risks and security implications. We are now observing novel, and previously unforeseen, vulnerabilities emerging with MCP.
This blog post marks the beginning of a series dedicated to exploring these new vulnerabilities. In this first installment, we will delve into the specifics of Tool Poisoning attacks.
Tool Poisoning Attacks: An Overview
Recently, Invariant Labs uncovered a novel vulnerability in the MCP: a new type of indirect prompt injection, which they dubbed the Tool Poisoning Attack (TPA). This attack exploits the way LLMs process tool descriptions. If a malicious actor embeds instructions in the description of a tool, the LLM may act on them without alerting the user, even if the tool is never explicitly identified. These attacks exploit the way tools or functions are presented and prioritized within the model's context, resulting in unintended and potentially malicious outcomes.
How Tool Poisoning Attacks Work
In essence, Tool Poisoning attacks manipulate the model's decision-making process by influencing which tools it perceives as most relevant or authoritative within a given context. The attack takes advantage of how AI tools work behind the scenes. While users only see a short, friendly description of what a tool does, the AI model actually sees the full details — including secret instructions that the user doesn’t know about. So, when someone uses the MCP server (hence tools) thinking it’s harmless, those hidden instructions can make the AI behave in dangerous or unexpected ways. This can involve:
Misleading Tool Descriptions: Crafting deceptive descriptions for legitimate tools that cause the model to misuse them.
Prioritization Hijacking: Forcing the model to prioritize a less secure or malicious tool over a more robust, secure alternative.
Contextual Ambiguity: Creating an ambiguous context where the model is more likely to select an unintended tool.
In an MCP server, every tool has some metadata associated with it, such as a name and a description. LLMs use this metadata to determine which tools to invoke based on user input. Compromised descriptions can manipulate the model into executing unintended tool calls, bypassing security controls designed to protect the system. Malicious instructions in tool metadata are invisible to users but can be interpreted by the AI model and its underlying systems. This is particularly dangerous in local MCP server scenarios downloaded from the internet without proper vulnerability checks.
Potential Impacts
The consequences of successful Tool Poisoning attacks can range from data breaches and unauthorized access to system manipulation and denial of service. For example, an attacker could:
Trick an LLM into using a "delete file" tool instead of a "read file" tool, even when the user's intent was to read.
Bypass security checks by making the model prioritize a less secure authentication tool.
Cause an agentic AI to perform actions with unintended side effects by misinterpreting the role of available tools.
Mitigation Strategies
Addressing Tool Poisoning vulnerabilities requires a multi-faceted approach, focusing on robust tool design, context validation, and continuous monitoring:
Clear and Unambiguous Tool Definitions: Ensure that tool descriptions are precise and leave no room for misinterpretation by the model.
Strict Contextual Validation: Implement mechanisms to validate the model's understanding of context and its intended tool usage.
Access Control and Permissions: Enforce granular access controls on tools, ensuring that the model can only invoke tools it has explicit permission to use for a given task.
Anomaly Detection: Monitor tool invocation patterns for unusual or suspicious activity that might indicate a Tool Poisoning attack.
Human-in-the-Loop Oversight: For critical applications, incorporate human oversight to review and approve tool executions, especially those with significant impacts.
Looking Ahead
In the subsequent blogs in this series, we will explore other emerging vulnerabilities within the Model Context Protocol, including data poisoning, prompt injection, and privilege escalation attacks. Understanding these risks is paramount as we continue to harness the power of advanced AI models.
Stay tuned for our next installment, where we will delve into Tool Hijacking.