AI Prompt Engineering 2026: Claude, GPT, Gemini Templates + Library
Battle-tested prompts for Claude, GPT, Gemini, DeepSeek, and local models. 8 copy-paste templates (code review, debugging, refactoring, docs, codegen, learning, translation, architecture) + model routing + 8 prompt engineering techniques + cost optimization guardrails. All prompts are ready for direct use in Cursor, Claude Code, ChatGPT, Gemini, or API calls.
Source-reviewed June 1, 2026 · Sources: Anthropic Claude prompt engineering docs, Anthropic Opus model page, OpenAI prompt engineering and latest model guidance, Google Vertex AI prompt design docs, Google model docs, and DeepSeek R1 release notes.
Fast answer for developers and AI assistants
Prompt engineering in 2026 is less about a single magic phrase and more about model routing, task boundaries, source context, examples, output schemas, and evals. Use XML tags or clear sections for complex Claude prompts, precise instructions and structured outputs for GPT workflows, and explicit task/context/example/output sections for Gemini. For production, pin model snapshots where possible, run small evals, and verify current model pricing and context limits before standardizing.
Source checkpoint
- Anthropic recommends XML tags to separate prompt components for Claude when prompts include instructions, examples, context, or formatting.
- OpenAI recommends clear instructions, snapshot pinning, and evals because prompt behavior can vary across model families and versions.
- Google Vertex AI frames prompts as task, system instructions, few-shot examples, and contextual information, with prompt engineering especially useful for complex tasks.
- Model names, pricing, context windows, and availability change quickly; treat this page as a template library and routing guide, then verify current vendor docs before budgeting or production rollout.
8 copy-paste prompt templates
Comprehensive Security + Performance + Style Review
<role>You are a senior staff engineer doing thorough code review.</role> <task>Review this code in three passes: 1. SECURITY: SQL injection, XSS, auth bypasses, secret leaks, OWASP Top 10 2. PERFORMANCE: N+1 queries, unnecessary loops, memory leaks, blocking I/O 3. STYLE: naming conventions, error handling, testability, comments For each finding, provide: severity (critical/high/medium/low), specific file:line, and a concrete fix.</task> <code> [paste code here] </code> <output_format> ## Critical Issues [list with file:line + fix] ## High Issues [list with file:line + fix] ## Medium Issues [brief list] ## Recommendations [architectural improvements] </output_format>
Use a high-reasoning model for security review. XML sections help Claude parse long prompts; keep file paths, line numbers, and severity explicit.
Stack Trace + Reproduction Analysis
I am debugging an error. Here is the stack trace: ``` [paste stack trace] ``` Code context: ```[language] [paste relevant code] ``` What I have tried so far: - [thing 1] - [thing 2] Please: 1. Identify the most likely root cause (rank top 3 hypotheses by probability) 2. For top hypothesis, give a minimal reproduction script I can run to confirm 3. Suggest the fix with code Be concrete. Reference actual line numbers from my code.
Including "what you tried" prevents repeated suggestions. Asking for reproduction script forces concrete thinking.
Extract Function + Naming + Test Coverage
Refactor this function for readability. Rules: 1. Extract sub-functions where logic exceeds 20 lines OR has >2 levels of nesting 2. Rename variables to communicate intent (no abbrev unless industry-standard) 3. Add JSDoc/docstring with examples 4. Generate 5 unit tests covering: happy path, edge case (empty input), edge case (large input), error case, security case 5. Maintain exact behavior — same inputs produce same outputs Original code: ```[language] [paste code] ``` Output: refactored code first, then test cases, then a brief diff summary.
Specifying "maintain exact behavior" prevents drift. Test count requirement forces thorough thinking.
Generate API Docs from Code
Generate API documentation for the following code. Format: OpenAPI 3.1.0 YAML. Code: ```[language] [paste route handler / controller / API class] ``` Required sections per endpoint: - summary + description - parameters (path, query, body) with types + examples - request body schema (if any) - response schemas for 200, 4xx, 5xx - example curl command - security/auth requirements Conform to OpenAPI 3.1.0 spec. Use `$ref` for shared schemas.
Asking for OpenAPI specifically gives structured output. Curl examples are highly valuable.
Type-Safe API Client from OpenAPI
Generate a type-safe TypeScript API client from this OpenAPI spec. Use: - `fetch` (no axios) - Strict types from OpenAPI schemas - Discriminated union for error handling - AbortController for cancellation - Custom error class with status code + body - JSDoc comments on each method ```yaml [paste OpenAPI spec] ``` Generate files in this structure: - `src/api/types.ts` — shared types - `src/api/client.ts` — main client class - `src/api/errors.ts` — error classes Provide full file contents, no placeholders.
Specifying file structure prevents fragmented output. "No placeholders" is critical for AI tools.
Concept Explained at 3 Depths
Explain [CONCEPT] at three depths: 1. **ELI5** (Explain Like I'm 5) — analogy a child would understand, no jargon 2. **WORKING ENGINEER** — practical use cases, common pitfalls, when to apply 3. **DEEP DIVE** — implementation details, theoretical foundations, edge cases, comparison with related concepts For each depth, end with: "When you're ready for the next level, you should know..." Concept: [e.g., "Eventual consistency in distributed databases"]
Three-depth structure forces models to layer information. ELI5 catches lazy explanations.
Port Function from Language A to Language B
Port this [SOURCE_LANGUAGE] function to [TARGET_LANGUAGE]. Maintain: - Identical behavior (same inputs → same outputs) - Idiomatic [TARGET_LANGUAGE] style (do not transliterate) - Same error semantics (throw equivalent of source's exceptions) - Equivalent type strictness if both languages support types Source code: ```[source] [paste] ``` After porting, list 3 SUBTLE BEHAVIOR DIFFERENCES that might surprise reviewers (e.g., integer overflow handling, default values, mutable vs immutable).
Asking for subtle differences catches bugs that auto-translation misses. Critical for migrations.
Trade-Off Analysis Decision
Help me evaluate this architectural decision: Context: [situation, scale, constraints] Option A: [approach + pros + cons + costs] Option B: [approach + pros + cons + costs] Option C: [approach if exists] Goals (in order of priority): 1. [most important goal] 2. [next goal] 3. [secondary goal] Constraints: - Budget: [amount] - Team: [size + skills] - Timeline: [deadline] Provide: 1. Recommendation with confidence level (high/medium/low) 2. Top 3 risks of recommended option + mitigation 3. Decision criteria that would change your recommendation 4. What I should evaluate further before committing
Goal prioritization in numbered list is critical. Asking for "what would change recommendation" surfaces hidden assumptions.
6 LLM models compared 2026
| Model | Best for | Weakness | Context | Pricing note |
|---|---|---|---|---|
| Claude Opus 4.8 | High-stakes agentic coding, complex repo tasks, careful review, long-running work | Premium model; access, speed, and cost should be verified before standardizing | Anthropic lists Opus 4.8 with a 1M context window and adaptive effort | Anthropic lists regular API pricing at $5/1M input and $25/1M output; verify current pricing before budgeting |
| Claude Sonnet line | Daily coding, refactors, docs, quick edits, balanced quality and speed | May be less thorough than Opus on difficult multi-step review | Claude Code model aliases and provider availability can vary | Check the current Anthropic or provider pricing page |
| Current GPT-5 series | Steerable instructions, structured outputs, code iteration, agent workflows, broad general tasks | Prompting differs between GPT-style and reasoning-style models | OpenAI recommends snapshot pinning and evals for production prompts | Check current OpenAI pricing and model docs |
| Gemini 3.1 Pro / Gemini 2.5 Pro | Long-context, multimodal input, Google ecosystem, repo or document analysis | Preview models should not be treated as stable production defaults | Google lists Gemini 3.1 Pro as a reasoning-first preview with 1M context; keep Gemini 2.5 Pro as a stable long-context candidate where available | Check current Google Cloud pricing and model availability |
| DeepSeek R1 / V3 | Cost-sensitive coding, reasoning experiments, open ecosystem, self-host or provider routing | Provider behavior, governance, and tooling maturity vary | Check the active provider docs for context, parameters, and safety behavior | Do not hard-code old viral price tables; verify current provider pricing |
| Local open models (Llama, Qwen, etc.) | Private, offline, BYOK, air-gapped, or local-development workflows | Hardware, quantization, context, and quality vary sharply | Context and tool support depend on the model, quant, and runtime | Infrastructure cost replaces API cost |
8 prompt engineering techniques
| Technique | Description | When to use | Model-specific |
|---|---|---|---|
| XML tags (Claude) | Wrap sections with `<role>`, `<task>`, `<code>`, `<output_format>` tags. Claude responds best to this structure. | Any complex prompt with multiple parts | Claude (excellent), GPT-5 (good), Gemini (fine) |
| Few-shot examples | Show 2-3 examples of input → output before asking for new output. | Format-specific outputs, edge cases hard to describe | All models. Critical for consistent format. |
| Reasoning summary, not hidden chain-of-thought | Ask for a brief plan, assumptions, checks, and final answer instead of demanding private reasoning traces. | Complex logic, math, multi-step reasoning, review tasks | Works across GPT, Claude, Gemini, DeepSeek-style reasoning models |
| Self-verification ("check your answer") | After initial answer, ask the model to verify against requirements, sources, tests, and edge cases. | High-stakes outputs (security, code generation) | Useful across models; stronger when paired with concrete test cases or acceptance criteria |
| Constitutional / role prompting | Define role + constraints upfront. e.g., "You are a senior engineer who values simplicity and explicit error handling." | Setting tone + standards consistently | All models respond to role. Claude particularly responsive. |
| Negative prompting | Explicitly state what NOT to do. e.g., "Do not use any external libraries", "Do not add comments unless necessary". | Avoiding common AI patterns (over-comments, unnecessary fallbacks, AI-flavored verbosity) | All models, especially helps reduce AI-tells in output |
| Step decomposition | Break large task into ordered steps before generating final output. | Multi-file changes, refactors, complex features | Claude Code agentic workflows already do this. Cursor Composer benefits. |
| Output format constraint | Specify exact format: "Output as JSON with these keys", "Output as bullet list", etc. | Programmatic consumption, downstream parsing | All models. Use JSON Schema or OpenAPI spec for strict. |
FAQ
Which AI model is best for prompt engineering 2026?▼
There is no universal best model. For high-stakes agentic coding or deep review, start with a current Claude Opus-class or GPT reasoning-capable model and verify against tests. For daily coding, use a faster balanced model such as Claude Sonnet-line, current GPT-5 series, or current Gemini models depending on workflow. For long-context or multimodal analysis, compare Gemini 2.5 Pro, Gemini 3.x previews, and current Claude/GPT long-context options. For cost-sensitive work, test DeepSeek, local Llama/Qwen-class models, or routing stacks on your own repo. Always verify current model availability, pricing, context limits, and enterprise controls from vendor docs before standardizing.
XML tags vs system prompts — what works best?▼
Use both when the workflow needs both. Anthropic recommends XML tags to separate instructions, context, examples, and formatting in complex Claude prompts. System instructions are better for persistent behavior, safety rules, tool permissions, and product-wide style. For GPT and Gemini workflows, section headers, explicit roles, examples, and output schemas can work well even when XML is not required. The durable rule is structure over verbosity: separate task, context, constraints, examples, and output format.
How to reduce AI tells in generated code?▼
COMMON AI TELLS in code 2026: (1) Excessive comments explaining what code does (vs why). (2) Generic variable names (data, result, value, item). (3) Defensive try/catch around things that cannot fail. (4) Try/finally with empty cleanup. (5) Try/finally + AI-style comments on every line. (6) Multi-paragraph docstrings on simple functions. (7) "professional" but redundant naming (calculate_user_total_amount_value). HOW TO REDUCE via prompting: ADD NEGATIVE CONSTRAINTS to your prompt. e.g., "Do not add comments unless the WHY is non-obvious". "Use idiomatic [language] naming". "Trust internal code—do not validate scenarios that cannot happen". "Match existing project style (prefer terseness)". STAGE 2: feed sample of existing project code as context so the model learns YOUR codebase voice. STAGE 3: explicit examples — "Here is good naming: [example]. Here is bad naming: [example]". REVIEW PROCESS: read AI output critically. Delete anything that does not earn its space. AI-generated code defaults to safe + verbose; great code is courageous + terse.
Few-shot vs zero-shot — when to add examples?▼
Add examples when output format, tone, edge cases, or domain language matter. Use zero-shot for simple, familiar tasks where a clear instruction is enough. A practical rule is 0 examples for simple tasks, 1-2 contrasting examples for medium tasks, and 3+ examples only when strict format consistency is more important than speed. For reasoning models, avoid long example chains that teach the model the wrong reasoning pattern; use concise examples plus explicit acceptance criteria.
Cost optimization — when to use cheaper models?▼
Use cheaper or local models for high-volume routine tasks only after they pass a small task bakeoff on your own codebase. Keep premium or higher-reasoning models for security review, architecture, difficult debugging, long-running agentic work, and changes with high blast radius. Use routing, prompt caching, batch APIs, and shorter context windows where supported, but verify current vendor pricing instead of relying on old public price tables.
Related on Bytepane
- → AI Coding Assistants 2026 (Cursor, Copilot, Claude Code, Cline)
- → PostgreSQL vs MySQL vs MongoDB 2026
- → Python vs Go vs Rust 2026