Prompt Injection Testing Guide 2026: LLM & Agent Security

The short version

Prompt injection is not just a user typing "ignore previous instructions." In production systems, the bigger risk is untrusted text hidden in pages, PDFs, emails, tickets, code comments, MCP tool descriptions, or retrieved RAG chunks. The model reads the text as context, and the attacker tries to turn context into instructions.

The only durable defense is layered control: scan risky text, isolate instructions from data, restrict tools, confirm sensitive actions, validate outputs, and log every agent decision that touches private data or external systems. A classifier is useful. A classifier alone is not a security boundary.

Start with the BytePane Prompt Injection Scanner, then use the matrix below to test your real application flows.

Prompt injection test matrix

Layer	Attack to test	Test case	Control
User prompt	Direct override, roleplay jailbreak, policy bypass	Paste override phrases, persona switches, and "print system prompt" requests into every user-facing input.	Classify and log risky text, but enforce policy through system design and output checks.
Retrieved content	RAG poisoning, hidden HTML, document comments	Index pages and files that contain invisible text, comments, and instructions aimed at the assistant.	Normalize retrieved text, strip hidden instructions, and keep retrieved content in a data-only channel.
Tools and actions	Unauthorized browsing, email, payment, database, file, or shell calls	Ask the agent to convert a benign task into a tool call outside the task scope.	Add deterministic authorization before each tool call and require confirmation for consequential actions.
Memory	Persistent instructions, preference poisoning, cross-session leakage	Store a malicious preference and verify it cannot steer future unrelated sessions.	Scope memory by user, task, and sensitivity; never store tool permissions or policy overrides in free text.
MCP/plugin descriptions	Tool description hijack, context spoofing, capability inflation	Add malicious instructions in tool docs, README files, or MCP server metadata.	Pin trusted servers, review manifests, and treat tool descriptions as untrusted data.

Five payload families every team should test

1. Instruction override

These are direct attempts to replace higher-priority instructions. They are easy to spot, but still useful as baseline tests because they reveal whether your app blindly trusts model output. Test phrases such as "ignore previous instructions," "you are now in admin mode," and "new system instructions follow."

2. Hidden document instructions

Indirect prompt injection hides instructions in HTML comments, CSS-hidden text, zero-width characters, footers, image alt text, metadata, PDF annotations, or support-ticket templates. Your ingestion pipeline should reveal or strip hidden text before the model receives it.

3. Data exfiltration

A serious attack usually tries to move secrets somewhere: API keys, session cookies, private user data, source code, internal emails, database rows, or environment variables. Block outbound network calls to unapproved domains, redact sensitive strings, and keep secrets out of context.

4. Tool abuse

Agents become risky when they can click, buy, browse, email, deploy, edit files, or query databases. Test whether a retrieved page can make the agent call a tool unrelated to the user's task. The model should propose actions, but deterministic code should decide what is allowed.

5. Persistent memory poisoning

If your product writes long-term memory, test whether malicious preferences survive into future sessions. Memory should be scoped, auditable, and unable to grant capabilities. Treat memory writes like database writes, not casual notes.

A practical control stack

Normalize inputs: remove hidden HTML, decode common encodings where reasonable, expose zero-width characters, and convert complex documents to plain text before scanning.
Label untrusted content: keep user input, retrieved documents, tool output, and system instructions in separate fields instead of one blended prompt string.
Use narrow tools: split tools by permission. A read-only search tool should not share the same authority as a write, email, payment, deploy, shell, or database tool.
Authorize outside the model: check every tool call against user intent, account permissions, domain allowlists, data sensitivity, and action reversibility.
Validate outputs: scan for leaked prompts, secrets, unsafe code, untrusted links, unexpected domains, and instructions that should not reach the user.
Log the chain: save input risk, retrieved source, tool-call decision, confirmation state, and output validation so failures can be debugged.

Engineering rule

If a prompt can cause money to move, data to leave, a file to change, an email to send, a deployment to happen, or a browser to act on a logged-in page, the model cannot be the final authority. Put deterministic code between the model and the action.

Prompt Injection Testing Guide 2026: LLM & Agent Security

Source-reviewed prompt injection checklist

Practical checks

Primary references

Related BytePane tools