BytePane

Prompt Injection Scanner

Scan LLM prompts, RAG snippets, AI agent inputs, and tool outputs for prompt injection, jailbreak, hidden instruction, and data exfiltration signals.

331 characters47 words

Findings

5 matched rules

Instruction override

critical

Attempts to replace the system/developer instructions or move the model into a new role.

Matched: Ignore all previous instructions

Mitigation: Treat untrusted text as data, not instructions. Keep policy and tool instructions outside retrieved/user content and enforce server-side allowlists.

System prompt extraction

high

Tries to reveal hidden prompts, policies, chain of thought, tool manifests, or internal instructions.

Matched: chain of thought

Mitigation: Never expose hidden instructions in model output. Add output checks for system prompt, policy, secret, and tool schema leakage.

Data exfiltration request

critical

Asks the agent to leak private data, credentials, secrets, user records, cookies, tokens, or environment variables.

Matched: API key

Mitigation: Use least-privilege tool scopes, per-action authorization, sensitive-data redaction, and deny outbound requests to unapproved domains.

Hidden or disguised instruction

medium

Uses HTML/CSS, comments, zero-width text, or encoded-looking payloads to hide instructions from a human reviewer.

Matched: display:none

Mitigation: Normalize HTML, strip comments, reveal hidden text, decode common encodings, and scan retrieved documents before they enter context.

RAG/document poisoning

medium

Looks like content that tells an AI reader how to summarize, cite, rank, or override the retrieved document.

Matched: When the assistant summarizes this

Mitigation: Store document text and retrieval instructions separately. Strip instructions aimed at the assistant from third-party documents.

Direct injection

User text tells the model to ignore, replace, or reinterpret higher-priority instructions.

Indirect injection

A web page, PDF, email, issue, or RAG document hides instructions that the user did not intend to send.

Agentic injection

The injected text steers tools, browser actions, database reads, file writes, payments, or outbound requests.

How to use this scanner in an LLM app

Prompt injection is most dangerous when untrusted text can influence tools, private data, or actions. Scan user input, retrieved chunks, web pages, uploaded files, tool results, and agent memory before they enter the model context. A text scanner is not a security boundary, but it catches common attack language early and creates a review trail for your team.

The strongest production pattern is layered: separate instructions from data, strip hidden HTML and comments, assign the agent the minimum tools needed, require user confirmation for irreversible actions, validate model outputs, and keep secrets out of the model context. If the model can browse, email, buy, write, deploy, or query private databases, every tool call needs a deterministic authorization check outside the model.

Why deterministic scoring helps

Many LLM security failures are not about one magic phrase. They come from combinations: a hidden web instruction, a permissive tool, a sensitive document, and an output channel. This scanner shows matched evidence so developers can decide whether to block, sanitize, review, or downgrade the agent capability for that request.

Read the full prompt injection testing guide for a practical checklist based on OWASP LLM01, RAG poisoning patterns, AI browser risks, and agent tool authorization.

Frequently Asked Questions

Is this an AI model?

No. This scanner uses deterministic rules so you can see exactly why a prompt was flagged. That makes it useful for pre-commit checks, support triage, QA, and agent input review. It should complement, not replace, runtime authorization and output validation.

Can this guarantee a prompt is safe?

No prompt scanner can guarantee safety. Prompt injection is contextual: a harmless sentence in one app can be dangerous when the agent has tools, private data, or write permissions. Use this as an early warning layer before model calls.

What should I scan?

Scan user prompts, uploaded documents, retrieved RAG passages, web pages before summarization, tool results, plugin descriptions, MCP server docs, and any text that an agent may treat as instructions.

Related Tools