How to Make an RPG With AI (Even If You Can’t Draw)

By Arron R.16 min read
How to make an RPG with AI in 2026: open WizardGenie, describe the world, classes, and combat in one prompt, and the agent scaffolds a Phaser project with stats

An RPG is the genre with the most moving parts. The mechanics list reads like a wishlist for the entire industry — stats, classes, inventory, equipment, NPC dialog, branching quests, turn order, status effects, party management, save and load — and most beginners burn out somewhere between drawing the first sprite and writing their fortieth dialog branch. AI agents now scaffold a playable Phaser RPG from a single sentence, run it in a browser tab, and let you fix the combat math by chat instead of by editing every system by hand. The art problem evaporates with it.

WizardGenie pipeline turning a one-line RPG prompt into a playable browser build with stats, world, and a combat menu
The four-stage browser-native pipeline: prompt, stats, world, playable. No engine install, no SDK, no asset pipeline to learn.

How to make an RPG in five steps

The full workflow lives in one browser tab. No editor download, no engine install, no per-engine quirks to learn. The five steps in order:

  1. Open WizardGenie in your browser. Pick the model the agent will run on (a frontier reasoner like Claude Opus 4.7 or GPT-5.5 is the right pick for the very first prompt).
  2. Describe the RPG in one paragraph. Setting, three or four classes, the combat style (turn-based, action, ATB, hybrid), the size of the world, the win condition. The agent writes a Phaser 4 project with a stat block, basic inventory, an overworld tilemap, an NPC dialog system, and a starter dungeon.
  3. Run the build in the side preview pane. The first run is always rough — placeholder sprites, dialog that sounds like an LLM, balance numbers that came out of nowhere — and that is fine. The point is to confirm the loop exists.
  4. Iterate by chat. “Slow the player walk speed.” “The slime hits too hard — lower its base damage and give it a 10 % chance of a glancing hit instead.” “Add a stamina meter that drains on dash.” Each message is one round-trip with the agent, usually a couple of seconds on a cheap executor model.
  5. Swap in real assets without leaving the tab. Quick Sprites for the player, enemies, and NPCs; Tileset Forge for dungeons and the overworld; AI Image Gen for portraits and item icons; Music Gen for the overworld and battle themes; SFX Gen for hits and UI clicks. The agent drops the Sorceress URLs straight into the project’s asset folder for you.

Total time to a playable first build is usually thirty to sixty minutes. A small starter dungeon with real art, three classes, and a soundtrack lands in roughly an evening. The work that comes after — quest writing, balance tuning, the long arc that makes a story actually land — is the part where you stop typing prompts and start playing the game.

What an RPG actually needs (mechanically)

Before talking to the agent it helps to know what an RPG is, mechanically. Role-playing video games as a genre go back to the late 1970s and the mechanical recipe has barely changed since the early console era: a controllable character (or party) with persistent stats, an inventory of equipment and consumables, a structured world full of NPCs and quests, and a combat system that resolves encounters by interacting with those stats. Underneath that simple description sits a stack of subsystems every modern Phaser RPG wires up:

Annotated diagram of an RPG dungeon scene showing stat block, inventory, XP bar, turn order, NPC dialog, and tilemap collision around a wizard sprite
The six subsystems the agent scaffolds for you: stats, leveling, inventory, turn order, dialog, tilemap collision. None of them are individually hard; the work is wiring them up so they don’t fight.
  • Stat block. Every controllable character has a small set of numbers — HP, MP, ATK, DEF, SPD, crit chance, evasion. Phaser doesn’t care what shape this object is; the agent will pick a JSON schema in the first prompt and stick to it. Five to seven primary stats is the sweet spot for a first project.
  • XP and leveling. Combat awards experience points that accumulate until a threshold is crossed and the character levels up. The thresholds usually grow geometrically (100, 250, 500, 1000…) so the curve flattens and high levels feel earned. The agent picks a default curve; you tune the multiplier with one chat message.
  • Inventory. A list of items the character owns, with stack counts and equip slots. The shape of the inventory — fixed-grid (Resident Evil), unlimited list (most JRPGs), weight-limited (Elder Scrolls) — changes the entire feel of the game. Pick one in the first prompt.
  • Combat resolution. Turn-based (Final Fantasy I), Active-Time Battle (Final Fantasy IV), action (Zelda), or hybrid (Mario RPG). Each has a different control scheme and a different damage formula. The agent picks one off your prompt and wires up the menu, the turn manager, and the damage math.
  • NPC dialog and quests. A tree (or graph) of dialog nodes, with branching choices, flag-checks, and quest-state tracking. The agent picks a small JSON dialog format and a quest log; you describe the actual content (“the blacksmith offers a side quest to retrieve his lost hammer from the spider cave”) and the agent writes the nodes.
  • Tilemap world. The overworld and every dungeon are usually a tilemap with collision flags on solid tiles and trigger flags on doors, signs, and chests. The agent generates a JSON tilemap and the loader code; you edit the tile grid by chat (“add a second floor to the dungeon with three locked doors”).
  • Save and load. Persisting the world state to localStorage or a server. Critical for any RPG longer than a single session. The agent wires up JSON.stringify save/load on a single hotkey by default; you can ask for autosave at every screen transition with one message.

You don’t have to write any of these by hand. You do have to know they exist, because that’s the vocabulary you use to describe what’s wrong. “Combat feels weird” is fuzzy. “Reduce slime base damage from 6 to 4 and add a 10 % crit chance with 1.5x damage on the player side” is a single chat message that the agent will land on the first try.

The first prompt that works

The first prompt is the most important decision in the whole workflow. A good first prompt locks the setting, the classes, the combat style, the world layout, the visual style, and the scope in one paragraph. A bad first prompt produces a generic “here is a Phaser starter” the agent then spends an hour reshaping into your idea.

The shape that works for an RPG:

Make a top-down 2D RPG in Phaser 4. The player is a young wizard who can
choose one of three classes at the start: Mage (high MP, fragile), Knight
(high HP, low MP), and Ranger (balanced, high SPD). Arrow keys move,
spacebar interacts with NPCs and chests, Z attacks, X opens the menu.
Turn-based combat: when the wizard touches an enemy on the overworld,
transition to a combat scene with a menu of ATTACK / SPELL / ITEM / RUN.
Five primary stats: HP, MP, ATK, DEF, SPD. XP curve doubles each level
(100, 200, 400, 800). Inventory is an unlimited list with equip slots
for weapon, armor, accessory. Pixel-art look, 32x32 tiles, 32x32
character sprites. World: a small village with three NPCs (innkeeper,
merchant, quest-giver), an overworld with a forest path, and a starter
dungeon with three slime enemies and a treasure chest containing a
healing potion. Save state to localStorage on a hotkey. Make it
actually playable, not just a render loop.

Notice what’s in there: setting (top-down 2D RPG), engine (Phaser 4), classes (three with distinct profiles), input (arrow keys, spacebar, Z, X), combat style (turn-based with the menu spelled out), stat names, XP curve formula, inventory shape, visual style (exact tile and sprite sizes), world scope (one village, one overworld, one dungeon), specific NPCs and items, and “actually playable” as a tag to nudge the agent into wiring up real menus and a real save system. What’s not in there: model picking, file structure, framework version. The agent handles all of that.

About sixty to a hundred and twenty seconds after sending the prompt, the agent returns a file tree with the project scaffolded. Resist the urge to read every file. Hit Run instead. WizardGenie launches the project in a side preview pane. The wizard is probably a colored square, the slimes might be larger colored squares, the dialog might sound like an LLM — that’s all normal. Confirm the menus open and the combat resolves, then go straight into the iteration phase.

Iterating on combat, stats, and quests

This is where vibe coding earns its name. Instead of opening files and editing variables, you describe what’s wrong and what you want, and the agent does the edit. A real iteration sequence for the wizard RPG:

Iteration log showing four agent rounds: scaffold, tune combat, world and quests, polish, with a chat-bubble mockup for each step
Four iteration rounds, four chat messages. The agent edits across whatever files it needs; your job is to play the build and report what’s still wrong.
  1. “Slimes hit for 6 damage and the wizard has 30 HP. Combat is over in five turns. Lower slime base damage to 4 and give them a 10 % glancing-hit chance for half damage.” Three-line diff in the slime stat block. Run a fight, feel the difference.
  2. “The Mage is too fragile. Bump the Mage starting HP from 18 to 24 and give them a once-per-battle Shield spell that costs 4 MP and absorbs the next hit.” One stat tweak plus one new spell entry in the spell list. The Mage suddenly has a real identity.
  3. “Add a critical-hit system: every attack rolls a d100 against the attacker’s crit chance (default 5 %, +1 % per SPD stat). Crits do 1.5x damage and show a yellow CRIT label.” Two-file change — the damage calculator and the combat-log renderer. Combat now has texture.
  4. “Add a side quest: the merchant in the village is missing his shipment of healing potions. He says it was raided by goblins in the forest. Killing the goblin party (three goblins, one boss) and returning to him gives 100 XP, 50 gold, and unlocks a new healing potion in his shop. Track quest state in the save file.” The agent writes the dialog tree, the goblin encounter, the quest log entry, and the merchant’s after-quest dialog. Fifteen lines of JSON, one new combat scene. About thirty seconds round-trip.
  5. “The game is too easy past level 4. Steepen the XP curve from doubling to 2.5x and bump the level-5 enemies’ HP by 30 %.” Two numbers, one chat message. The mid-game stops feeling trivial.
  6. “Add a status-effect system: poison (3 turns of 2 damage), burn (2 turns of 4 damage), freeze (skip one turn). Show a small icon next to the affected character’s name in the combat HUD. Spells and items can apply these.” One new module, one HUD update, one spell addition. The combat depth doubles.

Each of these is one chat message. The agent does the cross-file editing; you play the build between each round and report what’s still off. The loop is fast — often two or three iterations a minute on a cheap executor model — and it’s genuinely how the rest of the game gets built.

Art for an RPG when you can’t draw

The agent’s placeholders work but they look like placeholders. The reason most beginners give up on RPGs is that the art workload is enormous — a small starter dungeon alone needs a player sheet, three enemy sheets, an NPC sheet, a tileset with at least two surface types, a portrait for every speaking character, and a half-dozen item icons. None of this requires drawing skills anymore.

  • Player and enemy sprite sheets. For the pixel-art look most browser RPGs use, Quick Sprites is the fastest path — prompt-to-frame-grid in roughly two minutes per character. The Four-Angle-Walking style produces a 4-direction, 4-frame sheet that drops cleanly into a top-down RPG. For non-pixel styles (hand-painted, anime, full-color), Auto-Sprite v2 generates a character with AI image gen, animates it with AI video, and converts the clip into a clean sprite sheet.
  • Tilesets. Tileset Forge takes raw AI dungeon or overworld art and produces a tile-grid-aligned tileset PNG plus a tile-index map ready for Phaser tilemaps. Stone-floor, grass-floor, dungeon-wall, ice, lava — prompt a style, get a usable sheet in under five minutes. Seamless Tile Gen handles the textures that need to repeat across a large overworld.
  • Portraits and item icons. AI Image Gen covers the rest of the art — NPC portraits for the dialog box, item icons for the inventory grid, the title-screen background, the world map. The reference-image workflow is the trick for keeping every NPC in a coherent style; pick one portrait you like, drop it in as a reference, and prompt the rest of the cast against it.
  • Touch-ups. The browser-only Canvas editor is enough for the small fixes — cleaning a stray pixel on a sprite, adjusting a portrait’s background color, drawing a tiny UI heart icon. No desktop editor required.

The whole loop closes in one tab. You ask the agent to “use this sprite for the wizard” and paste a Sorceress URL; it writes the loader and swaps the placeholder. No download-then-upload step, no asset-pipeline configuration, no engine to learn.

Music, SFX, and voice for the world

An RPG soundscape is bigger than a platformer’s. You need at least a town theme, an overworld theme, a battle theme, a boss theme, a victory stinger, and a small library of UI and combat SFX. The whole audio side fits inside the Sorceress suite without ever opening a DAW.

  • Music. Music Gen turns a text prompt into full instrumental tracks. “Looping medieval town theme, gentle, 80 bpm, lute and recorder” yields two thirty-second variations. Same prompt shape for battle (“tense battle theme, 130 bpm, orchestral”), boss (“dark boss theme, 140 bpm, choir and percussion”), and victory stingers.
  • SFX. SFX Gen handles every short audio cue: sword swings, spell casts, item pickups, level-up chimes, menu clicks, footsteps on different surfaces. Batch a whole pack (“fifteen RPG combat hit sounds, ten footstep variants on stone and grass”) in one session. Trim and master each clip in SFX Editor if needed.
  • NPC voices. If you want voiced lines for important NPCs (the king’s opening monologue, the shopkeeper’s greeting), Speech Gen turns text into voiceover at any voice and language. Voice cloning keeps the same character voice consistent across hundreds of lines.

The agent fetches every URL the same way it fetches sprites — paste a Sorceress link in chat, and it writes the loader and the trigger code. The audio half of the project usually takes about an hour to assemble and another hour to balance.

Picking the right model for the agent

Inside WizardGenie, the agent itself is driven by an AI coding model that you pick from a dropdown. The current lineup includes Claude Opus 4.7, Claude Sonnet 4.6, GPT-5.5, Gemini 3.1 Pro, DeepSeek V4 Pro, Kimi K2.5, MiniMax M2.7, and Grok 4.2 (verified May 7, 2026 against the model picker source). Pricing and quality vary across roughly two orders of magnitude. The right pick depends on what kind of turn you’re taking.

  • One-shot scaffolding. The first prompt — the one that creates the entire project from nothing — benefits from a frontier reasoner. Claude Opus 4.7 or GPT-5.5 produce the cleanest Phaser 4 scaffolds in a single pass and are worth the cost for the moment that defines the whole project.
  • Iterative back-and-forth. Once the project exists and you’re tuning combat numbers, writing dialog, and adjusting quests, switch to a cheap executor. DeepSeek V4 Pro, Kimi K2.5, MiniMax M2.7, and Gemini 3.1 Flash are dramatically less expensive per token and fast enough that the chat loop feels real-time.
  • Big architectural changes. “Refactor the combat system into a state machine” or “move all the dialog into a graph database with branching flag-checks” — go back to a frontier reasoner for that single message, then drop straight back to a cheap executor.

If you’ve read about the Planner+Executor pattern in Best AI Model for Coding, the rule is hard: never put a frontier-priced model on the typing side of the pair. The cost ratio collapses if you do — the entire reason the pattern works is that the expensive model thinks while the cheap one types. A Sonnet-4.6-on-the-typing-side RPG costs roughly five times what the same project costs on a DeepSeek-V4-Pro executor with an Opus 4.7 planner, with no measurable quality gain on a tuning-heavy task like an RPG balance pass.

What WizardGenie won’t do for you

The agent is excellent at scaffolding, refactoring, and one-shot fixes. It’s mediocre at the parts of RPG development that require actual taste:

  • Story and character voice. The agent can write a starter dialog tree, but the second draft has to be yours. NPC voice, pacing, the emotional arc that makes a 20-hour RPG land — those are designer decisions. The agent will happily generate text; whether the merchant sounds like a person or a chatbot is your call.
  • Balance and difficulty curves. The scaffolded numbers are placeholders. Whether level-5 boss damage feels fair, whether the third dungeon is too long, where to put the mid-game power spike — those judgments come from playing the game. The agent executes whatever change you describe; it won’t tell you the design is wrong.
  • Coherent art direction. The Sorceress asset tools generate game-ready sprites, tilesets, and portraits, but choosing a coherent palette, deciding what reads at small sizes, and making the wizard’s silhouette legible against a busy tileset is your call. “Make everything pixel art” is not a style; “16-color GBC palette, four-frame walks, hard outlines” is.
  • Distribution. The output is a normal Phaser project — JavaScript files, an asset folder, a Vite config. npm run build produces a static dist/ folder you can drop on any static host (your own domain, GitHub Pages, an itch.io browser-game upload, Netlify). The build process is yours to run on the day you ship; the WizardGenie web app does not auto-publish the finished game anywhere on your behalf.

None of this is a flaw in the workflow — it’s where human work meaningfully begins. The agent removes the parts of RPG development that were always painful and never fun (asset bottlenecks, boilerplate scaffolding, file plumbing) and leaves the part that was always the actual game design.

Frequently Asked Questions

What is the easiest way to make an RPG in 2026?

Describe the RPG in one prompt to a browser-native AI game agent like WizardGenie, then iterate by chat. The agent writes a Phaser project with a stat block, basic inventory, a tilemap world, an NPC dialog system, and a turn-based or action-combat loop wired up. You refine by talking to the agent ("slow the player walk speed and add a stamina meter") instead of editing every system by hand. There is no engine install, no SDK, and no command line.

Can I make an RPG with AI if I can’t draw at all?

Yes — that is the entire point of the Sorceress workflow. Quick Sprites turns a text prompt into a clean pixel-art character with idle, walk, and attack frames in roughly two minutes. Tileset Forge turns AI-generated dungeon or overworld art into a tile-grid-aligned tileset, and AI Image Gen handles portraits, item icons, title screens, and UI art. The agent drops every URL into the project’s asset folder for you, so the workflow stays in one browser tab.

Do I need Unity, Godot, RPG Maker, or GameMaker to make an RPG?

No. WizardGenie ships a Phaser-based runtime that runs entirely in your browser tab — no engine install, no SDK, no command line. Phaser 4 (released April 10, 2026; verified May 7, 2026) is the actively maintained library most browser-first 2D games use today. RPG Maker, Unity, Godot, and GameMaker are still good choices if you want a desktop editor and an established asset pipeline, but they are not required to ship a playable RPG.

Can the AI design quests and dialog, not just code?

Partially. The agent will produce starter quest text, NPC barks, and a basic dialog tree, and you can ask it for variations ("give the merchant a longer haggling exchange", "add a side quest to retrieve the lost amulet"). Where AI is still weak is in pacing, character voice, and the long arc that makes a story land. You iterate with the agent by playing the build and reporting what feels off; the writing eventually becomes yours, even if the first draft is the agent’s.

What about art and music for the RPG?

Use Quick Sprites for the player and enemy sprite sheets, Tileset Forge for the dungeon and world tiles, AI Image Gen for portraits and item icons, Music Gen for the overworld and battle themes, SFX Gen for combat hits and UI clicks, and Speech Gen for any voiced lines. All six tools live in the same Sorceress account, and the WizardGenie agent can drop the resulting URLs into the project’s asset folder for you. No drawing required.

Which AI model should drive the WizardGenie agent for an RPG?

For one-shot scaffolding (the first prompt that creates the entire project), Claude Opus 4.7 or GPT-5.5 produce the cleanest Phaser RPG scaffolds in a single pass. For iterative back-and-forth ("add a status effect system", "fix the inventory drop bug"), the cheaper executor models — DeepSeek V4 Pro, Kimi K2.5, MiniMax M2.7, Gemini 3.1 Flash — are dramatically more cost-effective and fast enough that the loop feels real-time. Never put a frontier-priced model on the typing side of a Planner+Executor pair.

Can the RPG I build with WizardGenie be a real published game?

Yes. The output is a normal Phaser project — JavaScript files, an asset folder, a Vite or webpack config — and you can host it anywhere a static site works (your own domain, GitHub Pages, an itch.io browser-game upload, Netlify, Cloudflare Pages). It’s your code; nothing in the workflow locks the project to Sorceress, and you can keep editing and shipping updates long after the agent is done.

Sources

  1. Role-playing video game (Wikipedia)
  2. Experience point (Wikipedia)
  3. Tile-based video game (Wikipedia)
  4. Phaser official documentation
  5. Vibe coding (Wikipedia)
Written by Arron R.·3,664 words·16 min read

Related posts