AI Prompt Engineering 2026 — Claude, GPT, Gemini Templates + Library
Battle-tested prompts for Claude Opus 4.7, GPT-5, Gemini 2.5 Pro, DeepSeek V3, and self-hosted Llama. 8 copy-paste templates (code review, debugging, refactoring, docs, codegen, learning, translation, architecture) + 6 model comparison + 8 prompt engineering techniques + cost optimization tips. All prompts ready for direct use in Cursor, Claude Code, ChatGPT, or API calls.
Updated April 2026 · Sources: Anthropic Prompt Engineering docs, OpenAI GPT-5 best practices, Google Gemini 2.5 Pro guide, DeepSeek-R1 paper, AnthropicAPI Cookbook
8 copy-paste prompt templates
Comprehensive Security + Performance + Style Review
<role>You are a senior staff engineer doing thorough code review.</role> <task>Review this code in three passes: 1. SECURITY: SQL injection, XSS, auth bypasses, secret leaks, OWASP Top 10 2. PERFORMANCE: N+1 queries, unnecessary loops, memory leaks, blocking I/O 3. STYLE: naming conventions, error handling, testability, comments For each finding, provide: severity (critical/high/medium/low), specific file:line, and a concrete fix.</task> <code> [paste code here] </code> <output_format> ## Critical Issues [list with file:line + fix] ## High Issues [list with file:line + fix] ## Medium Issues [brief list] ## Recommendations [architectural improvements] </output_format>
Claude excels at security analysis. Use XML tags for clear sectioning. GPT-5 also strong but less rigorous on edge cases.
Stack Trace + Reproduction Analysis
I am debugging an error. Here is the stack trace: ``` [paste stack trace] ``` Code context: ```[language] [paste relevant code] ``` What I have tried so far: - [thing 1] - [thing 2] Please: 1. Identify the most likely root cause (rank top 3 hypotheses by probability) 2. For top hypothesis, give a minimal reproduction script I can run to confirm 3. Suggest the fix with code Be concrete. Reference actual line numbers from my code.
Including "what you tried" prevents repeated suggestions. Asking for reproduction script forces concrete thinking.
Extract Function + Naming + Test Coverage
Refactor this function for readability. Rules: 1. Extract sub-functions where logic exceeds 20 lines OR has >2 levels of nesting 2. Rename variables to communicate intent (no abbrev unless industry-standard) 3. Add JSDoc/docstring with examples 4. Generate 5 unit tests covering: happy path, edge case (empty input), edge case (large input), error case, security case 5. Maintain exact behavior — same inputs produce same outputs Original code: ```[language] [paste code] ``` Output: refactored code first, then test cases, then a brief diff summary.
Specifying "maintain exact behavior" prevents drift. Test count requirement forces thorough thinking.
Generate API Docs from Code
Generate API documentation for the following code. Format: OpenAPI 3.1.0 YAML. Code: ```[language] [paste route handler / controller / API class] ``` Required sections per endpoint: - summary + description - parameters (path, query, body) with types + examples - request body schema (if any) - response schemas for 200, 4xx, 5xx - example curl command - security/auth requirements Conform to OpenAPI 3.1.0 spec. Use `$ref` for shared schemas.
Asking for OpenAPI specifically gives structured output. Curl examples are highly valuable.
Type-Safe API Client from OpenAPI
Generate a type-safe TypeScript API client from this OpenAPI spec. Use: - `fetch` (no axios) - Strict types from OpenAPI schemas - Discriminated union for error handling - AbortController for cancellation - Custom error class with status code + body - JSDoc comments on each method ```yaml [paste OpenAPI spec] ``` Generate files in this structure: - `src/api/types.ts` — shared types - `src/api/client.ts` — main client class - `src/api/errors.ts` — error classes Provide full file contents, no placeholders.
Specifying file structure prevents fragmented output. "No placeholders" is critical for AI tools.
Concept Explained at 3 Depths
Explain [CONCEPT] at three depths: 1. **ELI5** (Explain Like I'm 5) — analogy a child would understand, no jargon 2. **WORKING ENGINEER** — practical use cases, common pitfalls, when to apply 3. **DEEP DIVE** — implementation details, theoretical foundations, edge cases, comparison with related concepts For each depth, end with: "When you're ready for the next level, you should know..." Concept: [e.g., "Eventual consistency in distributed databases"]
Three-depth structure forces models to layer information. ELI5 catches lazy explanations.
Port Function from Language A to Language B
Port this [SOURCE_LANGUAGE] function to [TARGET_LANGUAGE]. Maintain: - Identical behavior (same inputs → same outputs) - Idiomatic [TARGET_LANGUAGE] style (do not transliterate) - Same error semantics (throw equivalent of source's exceptions) - Equivalent type strictness if both languages support types Source code: ```[source] [paste] ``` After porting, list 3 SUBTLE BEHAVIOR DIFFERENCES that might surprise reviewers (e.g., integer overflow handling, default values, mutable vs immutable).
Asking for subtle differences catches bugs that auto-translation misses. Critical for migrations.
Trade-Off Analysis Decision
Help me evaluate this architectural decision: Context: [situation, scale, constraints] Option A: [approach + pros + cons + costs] Option B: [approach + pros + cons + costs] Option C: [approach if exists] Goals (in order of priority): 1. [most important goal] 2. [next goal] 3. [secondary goal] Constraints: - Budget: [amount] - Team: [size + skills] - Timeline: [deadline] Provide: 1. Recommendation with confidence level (high/medium/low) 2. Top 3 risks of recommended option + mitigation 3. Decision criteria that would change your recommendation 4. What I should evaluate further before committing
Goal prioritization in numbered list is critical. Asking for "what would change recommendation" surfaces hidden assumptions.
6 LLM models compared 2026
| Model | Best for | Weakness | Context | API cost |
|---|---|---|---|---|
| Claude Opus 4.7 | Long-context reasoning (200k+ effective), agentic tasks, careful security analysis | Slower than Sonnet, more expensive | 200k effective + 1M experimental | $15/1M input + $75/1M output |
| Claude Sonnet 4.6 | Daily coding work, fast iterations, balanced cost/quality | Slightly less rigorous than Opus on complex reasoning | 200k effective | $3/1M input + $15/1M output |
| GPT-5 | Quick iteration, broad knowledge cutoff (recent), strong general code | Less consistent at long-context, weaker on novel problem types | 256k | $5/1M input + $15/1M output |
| Gemini 2.5 Pro | Code reasoning + long-context (1M tokens), Google ecosystem integration | Less reliable on agentic multi-step tasks vs Claude | 1M tokens (huge) | $4/1M input + $12/1M output |
| DeepSeek-V3 / R1 | Cost-effective coding, open-weights option, math + reasoning (R1) | Smaller English ecosystem, less mature tooling | 128k | $0.27/1M input + $1.10/1M output (V3) |
| Llama 3.1 405B / Qwen 2.5 72B | Self-hosted, privacy-critical, air-gapped environments | Hardware-intensive (4x A100), not as polished as cloud | 128k | Self-host costs only |
8 prompt engineering techniques
| Technique | Description | When to use | Model-specific |
|---|---|---|---|
| XML tags (Claude) | Wrap sections with `<role>`, `<task>`, `<code>`, `<output_format>` tags. Claude responds best to this structure. | Any complex prompt with multiple parts | Claude (excellent), GPT-5 (good), Gemini (fine) |
| Few-shot examples | Show 2-3 examples of input → output before asking for new output. | Format-specific outputs, edge cases hard to describe | All models. Critical for consistent format. |
| Chain-of-thought ("think step by step") | Ask model to reason aloud before answering. | Complex logic, math, multi-step reasoning | GPT-5 (strong), Claude (already does this naturally), Gemini (helpful) |
| Self-verification ("check your answer") | After initial answer, prompt model to verify + correct. | High-stakes outputs (security, code generation) | Claude (excellent self-correction), DeepSeek-R1 (built-in reasoning) |
| Constitutional / role prompting | Define role + constraints upfront. e.g., "You are a senior engineer who values simplicity and explicit error handling." | Setting tone + standards consistently | All models respond to role. Claude particularly responsive. |
| Negative prompting | Explicitly state what NOT to do. e.g., "Do not use any external libraries", "Do not add comments unless necessary". | Avoiding common AI patterns (over-comments, unnecessary fallbacks, AI-flavored verbosity) | All models, especially helps reduce AI-tells in output |
| Step decomposition | Break large task into ordered steps before generating final output. | Multi-file changes, refactors, complex features | Claude Code agentic workflows already do this. Cursor Composer benefits. |
| Output format constraint | Specify exact format: "Output as JSON with these keys", "Output as bullet list", etc. | Programmatic consumption, downstream parsing | All models. Use JSON Schema or OpenAPI spec for strict. |
FAQ
Which AI model is best for prompt engineering 2026?▼
BEST AI MODEL 2026 by use case: AGENTIC LONG-RUNNING TASKS — Claude Opus 4.7. Best self-correction, careful reasoning, XML-tag responsive. DAILY CODING — Claude Sonnet 4.6. Best price/performance. WHOLE-REPO ANALYSIS — Gemini 2.5 Pro (1M token context). QUICK ITERATION + GENERAL CODE — GPT-5. BUDGET / SELF-HOSTED — DeepSeek V3 ($0.27/1M input, 30x cheaper than Opus). PRIVACY-CRITICAL — Llama 3.1 405B or Qwen 2.5 72B self-hosted via Ollama. NO SINGLE BEST MODEL — most senior devs use 2+ in combination. Claude Code (autonomous tasks) + Cursor (IDE assistant) + Cline (free OSS BYOK) is common 2026 stack.
XML tags vs system prompts — what works best?▼
XML TAGS dominate 2026 for Claude prompting. Claude responds dramatically better to structured `<role>`, `<task>`, `<code>`, `<output_format>` sections than free-form prompts. GPT-5 also accepts XML but is less sensitive to it. Gemini works fine with section headers (## or ---) instead. SYSTEM PROMPTS still useful for: persistent role across messages, application-specific personality, fixed output style. WHEN TO USE EACH: SINGLE-TURN TASK — XML tags in user message. CONVERSATIONAL APP — system prompt + user XML. AGENTIC FRAMEWORK — system prompt for behavior + tools, user message for specific task. STRUCTURE > VERBOSITY: a 200-token well-tagged prompt outperforms a 2,000-token rambling one.
How to reduce AI tells in generated code?▼
COMMON AI TELLS in code 2026: (1) Excessive comments explaining what code does (vs why). (2) Generic variable names (data, result, value, item). (3) Defensive try/catch around things that cannot fail. (4) Try/finally with empty cleanup. (5) Try/finally + AI-style comments on every line. (6) Multi-paragraph docstrings on simple functions. (7) "professional" but redundant naming (calculate_user_total_amount_value). HOW TO REDUCE via prompting: ADD NEGATIVE CONSTRAINTS to your prompt. e.g., "Do not add comments unless the WHY is non-obvious". "Use idiomatic [language] naming". "Trust internal code—do not validate scenarios that cannot happen". "Match existing project style (prefer terseness)". STAGE 2: feed sample of existing project code as context so the model learns YOUR codebase voice. STAGE 3: explicit examples — "Here is good naming: [example]. Here is bad naming: [example]". REVIEW PROCESS: read AI output critically. Delete anything that does not earn its space. AI-generated code defaults to safe + verbose; great code is courageous + terse.
Few-shot vs zero-shot — when to add examples?▼
FEW-SHOT prompting (showing 2-3 input → output examples) ALWAYS helps when: (1) Output format is non-standard or strict (JSON schema, OpenAPI YAML, custom DSL). (2) Edge cases matter (empty input, large input, error cases). (3) Tone or style is important (matches existing codebase). (4) Domain-specific terminology (medical, legal, financial). ZERO-SHOT works fine when: (1) Task is well-known and unambiguous. (2) Standard format (Python function, README markdown). (3) Speed matters more than format consistency. RULE OF THUMB: 0 examples for simple tasks. 1-2 examples for medium tasks. 3+ examples for strict format/edge-case heavy tasks. EXAMPLES TO USE: pick CONTRASTING examples — a happy path + an edge case + an error case. Avoid 3 similar examples (redundant). EXAMPLE COMPLEXITY: simple examples teach format, complex examples teach reasoning. Mix both. AGENTIC TOOLS (Claude Code, Cursor): often work best with NO few-shot — just a clear task. They use CONTEXT (file content) instead of examples.
Cost optimization — when to use cheaper models?▼
COST OPTIMIZATION 2026: TIER 1 ($) — DeepSeek V3 ($0.27 input + $1.10 output per 1M tokens). 30x cheaper than Opus. Quality 70-80% of Sonnet. USE FOR: high-volume routine code generation, summary, simple refactors. TIER 2 ($$) — Claude Sonnet 4.6 ($3 input + $15 output) or Gemini Flash ($1.20 input + $5 output). USE FOR: daily coding work, balanced quality. TIER 3 ($$$) — Claude Opus 4.7 ($15 + $75) or GPT-5 ($5 + $15). USE FOR: complex architecture, security review, high-stakes refactors, agentic long-running tasks. ROUTING STRATEGY: classify each request by complexity. Simple → Tier 1. Medium → Tier 2. Complex → Tier 3. Aider, Cline, Continue all support model routing. CACHING: prompt caching (Anthropic, OpenAI, Gemini) reduces input cost 10x for repeat context. CRITICAL for agentic workflows. Save full file context once + cache, then variations cheap. BATCHING: use batch APIs for non-urgent requests. 50% discount typical. GOOD FOR: documentation generation, log analysis, mass refactor. SELF-HOSTED via Ollama: fixed cost (electricity + hardware amortization). At >100M tokens/month, self-hosting Llama 405B or Qwen 72B becomes cheaper than cloud. EXAMPLE COST AT 1k TASKS/MONTH: Pure Opus = $1,500/mo. Smart routing (Sonnet for 70% + Opus for 30%) = $400/mo. With cache + batching = $250/mo.
Related on Bytepane
- → AI Coding Assistants 2026 (Cursor, Copilot, Claude Code, Cline)
- → PostgreSQL vs MySQL vs MongoDB 2026
- → Python vs Go vs Rust 2026