AI Coding API Pricing 2026: Claude, GPT-5, Gemini, DeepSeek

By Arron R.16 min read
AI coding API pricing in May 2026 spans a 200× cost spread, from frontier reasoners at $25–$30/MTok output (Claude Opus 4.7, GPT-5.5) to budget executors at $0.

This is the verified AI coding API pricing chart for every model that matters in 2026 — Claude, GPT-5, Gemini, DeepSeek, Grok, Kimi, Qwen, MiniMax, and Cursor Composer 2 — with input, output, cache hit, cache miss, and vision-support data pulled straight from each vendor’s official rate card on May 4, 2026. That includes this week’s launches: DeepSeek V4 (April 24), Kimi K2.6 (April 20), and Grok 4.3 (April 30).

The chart sits at the top because that’s the only number most readers actually need. Below it: a per-model play-by-play with the AI coding API pricing context for game-dev workflows specifically — which model earns its slot on which turn. WizardGenie bundles all the relevant ones behind a single picker so you can rotate through them without rewiring your project for each switch.

Best AI coding agents for game dev 2026 pricing comparison hero illustration with token rate cards
Every major AI coding agent that matters for game dev in May 2026, sorted by where it earns its slot. Prices verified from each vendor’s official rate card.

AI coding API pricing: the complete chart

All prices are per million tokens (MTok), in USD, on each vendor’s standard public API tier. “Cache hit” is what you pay when the prompt prefix is already cached on the vendor’s side. “Cache write” is what you pay to put a prefix into cache the first time. Numbers were pulled from each vendor’s published pricing page on May 4, 2026 — before you commit to a long session, double-check the live page for your account, since rates shift quietly.

Model Input Cache hit Cache write Output Vision Context
Claude Opus 4.7$5.00$0.50$6.25 (5m) / $10 (1h)$25.00Yes1M
Claude Sonnet 4.6$3.00$0.30$3.75 (5m) / $6 (1h)$15.00Yes200K (1M opt-in)
Claude Haiku 4.5$1.00$0.10$1.25 (5m) / $2 (1h)$5.00Yes200K
GPT-5.5$5.00$0.50auto$30.00Yes400K
GPT-5.5 Pro$30.00$180.00Yes400K
GPT-5.4$2.50$0.25auto$15.00Yes400K
GPT-5.4 mini$0.75$0.075auto$4.50Yes400K
Gemini 3.1 Pro (≤200K prompt)$2.00$0.20$4.50/M-hr storage$12.00Yes1M
Gemini 3.1 Pro (>200K prompt)$4.00$0.40$4.50/M-hr storage$18.00Yes1M
Gemini 3.1 Flash-Lite$0.25$0.025$1.00/M-hr storage$1.50Yes1M
DeepSeek V4-Pro (75% off promo)$0.435$0.003625auto on disk$0.87Yes1M
DeepSeek V4-Pro (standard, post-promo)$1.74$0.0145auto on disk$3.48Yes1M
DeepSeek V4-Flash$0.14$0.014auto on disk$0.28Yes1M
Grok 4.3$1.25$0.31auto$2.50Yes (+ video)1M
Grok 4.20$2.00$0.20auto$6.00Yes2M
Grok 4.1 Fast$0.20$0.05auto$0.50Yes2M
Kimi K2.6$0.95$0.16auto$4.00Yes256K
Kimi K2.5$0.60$0.10auto$2.50Yes256K
Qwen3 Coder$0.30$1.50Variant1M
MiniMax M2$0.30$0.03$0.375$1.20Yes256K
Cursor Composer 2$0.50$2.50
Cursor Composer 2 Fast$1.50$7.50

How to read this chart (it isn’t just “cheap = good”)

  • Output dominates the bill. Coding agents emit ~3–10× more output than input on a typical agentic turn (the diff is shorter than the file the model reads). Output token price is therefore the number that actually shapes your invoice.
  • Cache hits are the secret savings lever. Every vendor that lists a “cache hit” price is offering an 80–90% discount on prompt prefixes that haven’t changed since the last call. For agentic loops where the system prompt and project files barely change between turns, that discount applies to most of your input volume.
  • Vision matters more than people think. Game dev work is full of “look at this screenshot — the wizard is clipping through the floor”. A model without vision can’t help. Most frontier models support image input; Grok 4.3 added video input on April 30. Plain Qwen3 Coder text variant is text-only; vision-capable Qwen variants are sold separately.
  • Context window only matters above a threshold. Phaser jam games are 2K–10K lines; any 200K-context model holds them whole. Three.js projects with shaders, scene graphs, and post-processing can climb past 100K tokens fast — that’s where 1M / 2M context windows justify their pricing premium.
  • Promotional pricing is real but temporary. DeepSeek V4-Pro is 75% off through May 31, 2026 — that’s the effective rate for the next four weeks. After that you pay the standard $1.74 / $3.48. Plan your benchmarks around this so you don’t end up locked into a model that’s about to quadruple in price.
  • Batch APIs are 50% off. Anthropic, OpenAI, Google, and xAI all offer a batch tier at half price for non-interactive workloads. Most agentic coding sessions can’t use batch — interactive latency matters — but it’s worth knowing for offline prep work like asset descriptions or doc generation.
Bar chart visualization of AI coding agent output token prices from $0.28 to $30 per million tokens for game development
Output token cost per million tokens, sorted. The vertical span is a 100× spread — choose deliberately, not by reflex.

Anthropic: Claude Opus 4.7, Sonnet 4.6, Haiku 4.5

Anthropic’s lineup is the cleanest pricing ladder of any vendor: a tight 5× factor between each tier. Opus 4.7 (released April 16, 2026) is the frontier reasoner — the model to spend on when the agent is making real design decisions about your game (game-feel, scoring rules, win conditions, refactors that touch four files at once). At $5 input / $25 output it’s expensive, but with prompt caching the input side drops to $0.50, and Opus 4.7 ships with the full 1M-token context window at standard pricing — no opt-in, no surcharge. On agentic loops where the same files get re-read on every turn, that combination is doing serious work.

Sonnet 4.6 is the workhorse: roughly one-fifth the cost of Opus, and on most coding tasks it returns about 90% of Opus’s quality. For routine refactors, adding HUD elements, wiring up new entities to existing systems — Sonnet is the right default. The new tokenizer in Opus 4.7 (per the published Claude API pricing page) uses up to 35% more tokens for the same text vs. older Anthropic models, so a “cheaper” output rate doesn’t always mean a cheaper request — keep that in mind when comparing per-Mtok numbers across generations.

Haiku 4.5 at $1 / $5 is the smallest member that’s still serious enough to ship with. For purely mechanical work — sprite path edits, constant tweaking, tiny UI patches — Haiku is fine. It’s also a legitimate executor in a Planner+Executor pair if you want to stay inside the Anthropic ecosystem (more on the pattern below).

OpenAI: GPT-5.5, GPT-5.4, GPT-5.4 mini

OpenAI’s GPT-5 family is structured like Anthropic’s: a flagship, a value tier, and a mini. The wrinkle is that GPT-5.5’s output token rate is $30/MTok — the highest in the entire chart for a generally available model. Per OpenAI’s published API pricing, cached input tokens are 90% off across the GPT-5 line, which materially changes the math for any session where the system prompt and project context get reused turn after turn (which is every coding session worth caring about).

One detail that catches people: for prompts above 272K tokens, GPT-5.5 input pricing doubles and output pricing rises 1.5×. If your project is already past 200K tokens of context, plan around the threshold or switch to a model that prices long context flat (Opus 4.7 at 1M flat, Gemini 3.1 Pro on its tier system).

GPT-5.5 wins one-shot vibe-code prompts. Drop a paragraph describing a game, and GPT-5.5 commits to architectural choices without three rounds of clarifying questions. That’s exactly what you want when you’re feeling out a prototype in five minutes. It’s the wrong model for a 30-turn session where every turn costs $0.10 of output — that’s where GPT-5.4 (half the price) or 5.4 mini (one-sixth) take over. GPT-5.5 Pro at $30 / $180 sits in a different league entirely; only reach for it on tasks where every other model has actually failed to produce a correct answer, not as a default coding agent.

GPT-5.4 mini at $0.75 / $4.50 is one of the better priced “smart enough to actually use” coding agents in the whole chart. It’s an acceptable executor in a Planner+Executor pair if you’re already paying OpenAI bills and don’t want to add a second vendor.

Google Gemini 3.1 Pro and 3.1 Flash-Lite

Gemini 3.1 Pro Preview is Google’s frontier coding agent and ships with the largest standard-tier context window of any frontier model on the chart — 1 million tokens. Per Google’s published Gemini API pricing, Gemini 3.1 Pro is tiered: $2 input / $12 output for prompts under 200K tokens, $4 input / $18 output above. For Phaser-scale projects you stay in the cheap tier; for serious Three.js scene-graph work you cross into the higher tier and the agent earns its keep by holding the entire project at once.

Caching on Gemini works differently than on the other vendors — it’s a separate “context cache” you create explicitly, billed at $0.20/MTok read plus $4.50/MTok per hour of storage. For long agentic sessions on the same project, those storage hours are worth it; for a one-off prompt, skip the cache entirely and pay the standard input rate.

Gemini 3.1 Flash-Lite at $0.25 / $1.50 is the cheapest serious model from a major US lab. It’s not as smart as 3.1 Pro, but it’s fast, multimodal, and 1M-context — meaning it can hold an entire jam-scale project at once for executor-tier prices. Strong pick for the typing side of a Planner+Executor pair when you also want long context.

DeepSeek V4-Pro and V4-Flash (the cheapest serious coding agents)

DeepSeek released V4 on April 24, 2026 — two models that reset the price floor for the entire industry. V4-Flash at $0.14 input cache miss / $0.014 cache hit / $0.28 output is the new price-anchor for budget executors, with a full 1M context window. V4-Pro is the frontier-class member: standard pricing $1.74 / $0.0145 / $3.48, currently running 75% off through May 31, 2026, which puts it at $0.435 / $0.003625 / $0.87 — frontier-grade reasoning at executor-tier prices for the next four weeks.

The cache mechanic is the killer feature. DeepSeek runs context caching on disk automatically — no code changes — and per their published pricing, the cache hit price was reduced to 1/10 of launch price across the board on April 26. For agentic coding loops where the project files repeat turn-over-turn, you’ll see closer to 90% savings on the input side without doing anything special.

Both V4 models support image understanding (not generation) — you can drop a screenshot of a broken collision and get useful diagnostic output. The 1M context window means even Three.js projects with shaders fit whole. The legacy deepseek-chat and deepseek-reasoner model IDs sunset on July 24, 2026 — migrate to deepseek-v4-flash or deepseek-v4-pro well before then.

xAI Grok 4.3, 4.20, and 4.1 Fast

Grok 4.3 (released April 30, 2026) is xAI’s new flagship and a major price cut from the previous generation. At $1.25 input / $2.50 output with $0.31 cached input, Grok 4.3 is roughly half the cost of Grok 4.20 on input and 60% cheaper on output. Per xAI’s published model documentation it added video input as a new modality and supports native file generation (PDF, PPTX, XLSX) — for a coding-agent workflow that mixes “watch this gameplay clip” and “look at this design doc” turns, the multimodal upgrade is real.

Grok 4.20 stays in the lineup for one specific reason: a 2 million token context window, which is double Gemini 3.1 Pro’s. For a serious game project that’s grown a sprawling scene graph, post-processing chain, and asset manifest, 2M context means the agent never has to summarize anything to fit. At $2 input / $6 output it’s still a competitive long-context option even after Grok 4.3 came out cheaper.

Grok 4.1 Fast at $0.20 input / $0.50 output is the surprise budget pick. Same 2M context as Grok 4.20, with cache hits at $0.05 — a frontier-context-window model at executor-tier pricing. The “Fast” tag is real: it’s tuned for low-latency responses, which makes it a strong typing-side model in a Planner+Executor pair if you also want a long context.

Moonshot Kimi K2.5 and K2.6

Moonshot launched Kimi K2.6 on April 20, 2026, alongside the existing K2.5 from January. K2.5 ($0.60 / $2.50 with $0.10 cache hit) is the value tier — natively multimodal with around 15 trillion mixed visual+text training tokens, 256K context, strong coding behavior. K2.6 ($0.95 / $4.00 with $0.16 cache hit) is the new frontier-of-the-budget-tier: better long-horizon coding stability, improved instruction following, automatic context caching, and enhanced agent execution. Both run a 256K context window.

For game-dev workflows specifically, K2.6 earns its premium when the task involves following a long convention chain across many files (e.g., “every entity in this folder uses pattern X — apply X to this new file”). For mechanical typing turns, K2.5 at half the output rate is the better pick. Both support vision, so you can paste screenshots of broken physics or layout problems either way.

Alibaba Qwen3 Coder

Qwen3 Coder on Alibaba’s DashScope is $0.30 / $1.50 for the text-coding variant with a 1M context window. It’s purpose-built for code — function-calling and agentic tool use are first-class, and the model’s been instruction-tuned heavily on coding workflows. For game-dev specifically, Qwen3 Coder is competitive with DeepSeek V4-Flash on price with a similar context window, though without DeepSeek’s aggressive automatic cache discount the cumulative bill ends up close.

MiniMax M2 (agent-tuned at executor pricing)

MiniMax M2 ($0.30 / $1.20) is the cheapest model in the chart that’s been explicitly tuned for agentic tool use. Cache reads at $0.03/MTok are the lowest in the whole chart — if your loop is dominated by re-reading a stable system prompt, M2 is the cheapest place to do it. The 256K context window is enough for any individual scene or system file, and M2’s tool-use accuracy is what earns it a slot when the agent is making lots of small API calls per turn (read file, list directory, run lint, etc.) rather than producing one big diff.

Cursor Composer 2 (the IDE-bundled option)

Cursor’s Composer 2 is the only model in this chart that isn’t sold as a raw API — it’s bundled into the Cursor IDE and priced in two variants: Composer 2 standard at $0.50 / $2.50, and Composer 2 Fast at $1.50 / $7.50. On individual Cursor plans there’s a separate monthly usage pool, so you may not see per-token billing at all until you exceed it; team and enterprise plans charge the API rates above directly. Composer 2 is a real model — not a wrapper around someone else’s — so its pricing is independent from the rest of the chart.

Decision flowchart for choosing the best AI coding agent for game development based on task type and budget
Pick by task, not by reputation. Long context, frontier reasoning, budget executor, vision input — each branch has a clear winner.

The Planner+Executor pattern: where these prices actually pay off

The single biggest cost-saving move in agentic coding is splitting the loop into two models: an expensive reasoner that thinks, and a cheap fast model that types. We covered the architecture in detail in our flagship coding-model post; the pricing chart above tells you which pairings actually save money.

The math: a frontier-only session at GPT-5.5 prices burns ~$30/MTok on output. A Planner+Executor pair where Opus 4.7 plans (a few hundred output tokens per turn) and DeepSeek V4-Flash types (the bulk of the diff) lands the average effective output price at ~$0.50–$2/MTok — roughly a fifth to a tenth of single-frontier cost without losing planning quality.

The hard rule: never put a frontier-priced model on the typing side. Sonnet 4.6, GPT-5.4, and Gemini 3.1 Pro are all wrong as executors — their output rates are too close to the frontier to capture the savings the pattern is supposed to deliver. Acceptable executors at May 2026 prices, in rough order of preference for game work:

  • DeepSeek V4-Flash — $0.28/MTok output, 90% cache discount on input, 1M context. The default pick.
  • Grok 4.1 Fast — $0.50/MTok output with a 2M context window. The pick when the executor needs the longest context.
  • Gemini 3.1 Flash-Lite — $1.50/MTok output, multimodal, 1M context. The pick when you’re already on Google.
  • MiniMax M2 — $1.20/MTok output, lowest cache-read rate in the chart. The pick when the loop is tool-call heavy.
  • Kimi K2.5 — $2.50/MTok output, multimodal, 256K context. The pick when the executor needs serious coding chops, not just a typing pass.
  • Claude Haiku 4.5 — $5/MTok output. The pick when you want to stay inside the Anthropic ecosystem and inherit Opus’s cache.

Putting the AI coding API pricing chart into practice in WizardGenie

The reason the AI coding API pricing chart matters more than any single model recommendation is that you’ll switch between vendors constantly during a real session. WizardGenie’s model picker is built around exactly this — pick the model for the next turn, not the next session. A practical rhythm we use:

  1. Open with Opus 4.7 or DeepSeek V4-Pro (75%-off promo) for the first scaffolding turn. Get the architectural choices right, pay frontier prices for ~5 minutes — and right now V4-Pro gives you near-frontier reasoning at executor pricing, which is too good to skip while the promo lasts.
  2. Switch to DeepSeek V4-Flash or Grok 4.1 Fast for the long tail of small mechanical edits. 90% of session turns are this kind of work, and the cost difference is two orders of magnitude.
  3. Switch back to a frontier model only when something actually breaks in a way that needs careful diagnosis. One Opus turn here saves three Flash turns of guessing.
  4. Use Gemini 3.1 Pro or Grok 4.20 when the project has grown past 200K tokens and the agent needs to hold every file at once. Grok 4.20’s 2M context is the only option for genuinely huge codebases.

Bring your own key for any of these vendors and the pay-per-token math is what you saw above. The free trial inside WizardGenie burns vendor-managed quota; once you bring keys, you pay the rates each vendor publishes — there’s no markup on top.

Frequently Asked Questions

Which AI coding agent is cheapest for game dev right now?

DeepSeek V4-Flash at $0.14 input cache miss / $0.014 cache hit / $0.28 output, with a 1M context window. With its automatic disk-based context caching, sustained agentic loops on a stable project hit closer to $0.05/MTok effective input cost. For frontier-level reasoning at the same price band, DeepSeek V4-Pro at $0.435 / $0.87 (75% off through May 31, 2026) is the limited-time best deal in the entire chart.

Do AI coding agents support vision (image input) for game dev?

Most do. Claude Opus 4.7, Sonnet 4.6, Haiku 4.5, the entire GPT-5 family, Gemini 3.1 Pro, all Grok 4 variants (4.3 added video input in April 2026), MiniMax M2, Kimi K2.5, and Kimi K2.6 all support image input. DeepSeek V4-Pro and V4-Flash both support image understanding (not generation). Plain Qwen3 Coder text variant is text-only; vision-capable Qwen variants are sold separately.

What's the cheapest way to use a frontier coding agent?

Two levers stack: prompt caching and Planner+Executor. Cache hits cost 10% of standard input across most vendors, so for an agentic loop where the project context repeats, your input bill drops 80–90%. Planner+Executor cuts the output bill — a frontier model plans, a cheap model types — typically dropping average output cost to ~10–20% of single-frontier rates.

How does Grok 4.3 compare to Grok 4.20 for game dev?

Grok 4.3 (released April 30, 2026) is roughly half the input price and 60% cheaper on output, with a 1M context window and added video input. Grok 4.20 still wins on context length — its 2M window is twice as large — so it stays the right pick for a project that's already crossed 1M tokens. For everything else, Grok 4.3 is the new default xAI model.

Why do output tokens cost so much more than input tokens?

Generating tokens autoregressively costs more compute per token than processing input. Output requires running the full model forward for every token, while input runs in a single batched pass. Every vendor on this chart prices output at 3–10× the input rate, which is why output dominates the bill.

Is Cursor Composer 2 cheaper than the API models?

Composer 2 standard at $0.50 / $2.50 is in the same price band as Kimi K2.5 and Qwen3 Coder. The Fast variant at $1.50 / $7.50 sits between Sonnet 4.6 and Opus 4.7. Cursor's individual plans bundle a monthly usage pool, so per-token rates may not apply until you exceed it.

How often do AI coding agent prices change?

Roughly monthly right now. Anthropic dropped Opus input from $15/MTok (Opus 4.1 era) to $5/MTok with the 4.5+ generation. DeepSeek launched V4 on April 24 and cut V4 cache hit prices by 10× on April 26, 2026. Kimi shipped K2.6 on April 20. Grok 4.3 launched April 30 with a 40–60% cut from 4.20. Always re-check the live pricing page before a long session.

Sources

  1. Large language model (Wikipedia)
  2. Token (computational linguistics) (Wikipedia)
  3. Mixture of experts (Wikipedia)
  4. Phaser 3 — Documentation
  5. Three.js — Documentation
Written by Arron R.·3,542 words·16 min read

Related posts