As AI becomes embedded in our daily tools, a new class of cyberattack is emerging - Man-in-the-Prompt (MitP). Unlike traditional attacks that exploit code or networks, MitP targets the very language that powers AI systems.
What Is a Man-in-the-Prompt Attack?
MitP attacks manipulate prompts - the natural language instructions sent to Large Language Models (LLMs). Because these models process system instructions, user inputs, and third-party content together, attackers can sneak in hidden commands to make the AI behave in unexpected or harmful ways.
For example, malicious prompts can override safety rules, extract private data, or generate biased or misleading content - all without traditional hacking.
How It Happens
-
Indirect Prompt Injection: Attackers hide malicious prompts inside web pages, emails, or files. When the AI processes this content, it treats the hidden instructions as valid.
-
Malware That Talks: Some recent malware (ex. “Skynet”) embeds prompt injections directly in its code to confuse AI-based security tools.
-
Prompt Stealing & Jailbreaking: Attackers trick LLMs into revealing their own internal instructions or bypassing filters.
-
Imprompter Attacks: Researchers have shown that carefully crafted inputs can trick public chatbots into leaking sensitive data.
Why It Matters
MitP attacks don’t require technical hacking - just clever language. This makes them:
-
Scalable: One prompt can target thousands of users via emails or shared files.
-
Hard to detect: Traditional security tools don’t flag natural language threats.
-
Effective: A single hidden instruction can override entire safety protocols.
Real-World Risks
These attacks can lead to:
-
Data leaks: Exposing private or sensitive information.
-
Misinformation: Generating false or harmful content.
-
Security bypasses: Making AI perform banned or dangerous tasks.
-
Malware assistance: Helping craft malicious code or phishing messages.
How to Defend Against It
-
Sanitize Inputs: Filter and separate trusted prompts from external content.
-
Isolate Roles: Clearly define what’s a system command vs. user input.
-
Limit Prompt Privileges: Only allow trusted sources to issue high-level instructions.
-
Use Human Oversight: Review high-impact outputs manually.
-
Monitor Language Use: Look for strange or manipulative phrasing.
In a world where AI listens to language, words have power - and risk. The next wave of cyber threats won’t just be written in code. They’ll be whispered into the prompts. It's time we start treating prompts like the new security perimeter they are. Need help assessing your AI risk? Contact our cybersecurity team today to future-proof your AI systems before the next threat talks its way in.