AI Voice Generator Character Voices (Indie NPC Cast 2026)

By Arron R.15 min read
AI voice generator character voices for indie game NPCs in 2026: cast the merchant, king, healer, and lead from 17 Sorceress Speech Gen presets, write dialogue

An AI voice generator character voices pipeline that ships an indie game’s full NPC cast in an afternoon is the bottleneck most jam builds hit somewhere between week six and week twelve in 2026 — the vibe coder with placeholder “NPC says X” subtitle boxes, the jam team three days from submission with a silent merchant who really should be talking, the hobbyist writer who voiced the entire prototype as themselves and now needs the king to actually sound like a king. The honest fix is no longer hiring four actors and booking studio time. Sorceress Speech Gen ships 17 preset character voices, 8 per-line emotion controls, two model tiers, and one-tap voice cloning for the lead, all in a single browser tab at fractions of a cent per line. This post walks the honest pipeline: the four pieces every indie voice cast needs, the casting pattern that maps presets to NPC roles, the dialogue-batching pattern that keeps iteration cheap, the 400-credit clone that locks the lead, and the SFX Editor trim that makes every line sound shipped.

Four-panel diagram of an AI voice generator character voices pipeline for indie games — the cloned lead, the preset party, the recurring NPCs with emotion control, and the engine-ready MP3 export
The four pieces of an indie game voice cast (cloned lead, preset party, recurring NPCs with emotion control, MP3 export) shipped from Sorceress Speech Gen in a single afternoon.

What “AI voice generator character voices” actually means in 2026

An AI voice generator for character voices is a different problem than a generic speech synthesis tool. Generic TTS reads a paragraph in one voice for an audiobook, an accessibility overlay, or a YouTube narration. Game character voices have to hold up under harder constraints: every non-player character needs a distinct voice from the others, every line needs to convey the emotion the script intends (fear, anger, surprise, calm), and the lead character has to sound consistent across hundreds of dialogue lines written months apart. A flat, neutral TTS read makes every NPC sound like a robot reading a recipe, which breaks immersion the second the merchant and the king sound identical.

That difference is why the 2026 indie voice-cast workflow is built on four primitives that don’t exist in generic TTS. Preset voice variety. The Sorceress Speech Gen library ships 17 named character voices (verified at the PRESET_VOICES array on lines 156 to 174 of src/app/speech-gen/page.tsx on June 12, 2026), enough to give every recurring NPC in a 6-hour indie game its own distinct sound. Per-line emotion conditioning. The 8-emotion picker (Neutral, Happy, Calm, Sad, Angry, Fearful, Disgusted, Surprised, verified on lines 179 to 188) attaches to the line, not the voice, so the same merchant can say a Happy greeting and a Sad farewell. Voice cloning for the lead. A single 4-minute recording, 400 credits, and a permanent voice ID lock in the protagonist’s sound for every future line. Engine-agnostic export. Every generated line is an MP3 the engine’s audio loader reads natively, the same way it reads music and SFX. Voice cloning and emotion control are what separate a finished indie cast from a tech demo with placeholder reads.

The four pieces every indie game voice cast needs

Naming the four pieces up front gives the casting decision a clean target for each role and keeps the cast from sprawling into 12 indistinguishable voices the player can’t tell apart. Every shipped indie game from a 90-minute jam build to a 40-hour RPG has roughly the same four voice categories.

Piece 1: The lead character. One voice. This is the protagonist, the player avatar, or the first-person narrator. The player hears this voice 60 to 80 percent of the time, so the trade-off is consistency over variety. The right move is to clone a single real voice (the developer’s own, a friend’s with permission, or a partner’s) once at 400 credits and reuse the voice ID for every line the lead ever speaks.

Piece 2: The party / supporting cast. Two to four voices. The questgiver, the rival, the mentor, the love interest. These voices recur but always alongside the lead, so they need to be distinct from the lead and distinct from each other. The right move is to pick four sharply contrasting presets (one deep male, one bright male, one calm female, one bright female) and lock the mapping early so the player builds the voice-to-character association in the first chapter.

Piece 3: The recurring NPCs. Five to eight voices. The town merchant, the inn-keeper, the guard captain, the village healer, the local rival. These appear in multiple scenes and need their own voices so the player recognizes them on return visits, but they don’t need the lead’s cloned-fidelity polish. Standard preset voices with archetype-matched picks handle the entire roster.

Piece 4: The throwaway NPCs. Two to three voices used at random. The random villager, the random soldier, the random shopkeeper in a town the player visits once. These get rotated across many one-line characters. Pick two or three presets the player won’t hear in the recurring cast and rotate them at random so two adjacent generic NPCs don’t sound identical.

The math for a typical 6-hour indie game ends up at: 1 cloned lead + 4 supporting cast + 6 recurring NPCs + 3 throwaway voices = 14 distinct voice IDs total, 13 of them preset (zero per-voice cost) and 1 cloned (400 credits). With roughly 30 minutes of total dialogue at HD quality, that’s about 60 credits of TTS plus the 400 clone = 460 credits, which a single $10 Starter pack covers with credit left over.

Step 1 — Cast your voices from the 17 Sorceress Speech Gen presets

The casting step at sorceress.games/speech-gen is where most first-time indie voice projects go wrong. The default reflex is to pick whichever voice sounds “best” in isolation and assign it to the protagonist, which leaves a depleted preset bench for the supporting cast and forces compromises later. The right pattern is to cast the whole game in one sitting, working from the back of the cast to the front.

The Speech Gen preset library splits the 17 voices into 9 male and 8 female options (verified on lines 156 to 174 of src/app/speech-gen/page.tsx). The male roster: Deep Voice Man, Casual Guy, Patient Man, Young Knight, Determined Man, Decent Boy, Imposing Manner, Elegant Man, Friendly Person. The female roster: Wise Woman, Calm Woman, Inspirational Girl, Lively Girl, Lovely Girl, Abbess, Sweet Girl, Exuberant Girl. Each name is a direct archetype cue — the model picked these labels deliberately so a casting decision can be made by reading the list, not by auditioning all 17 against the same line.

The archetype-to-preset mapping that ships well for fantasy and modern indie games: merchant → Friendly Person; king or boss → Imposing Manner or Determined Man; elder mentor → Patient Man or Wise Woman; young hero rival → Young Knight or Inspirational Girl; rogue or thief → Casual Guy or Lively Girl; villain → Deep Voice Man or Determined Man; healer → Wise Woman or Calm Woman; cleric or nun → Abbess; child NPC → Decent Boy or Sweet Girl; court noble → Elegant Man or Lovely Girl; bubbly companion → Exuberant Girl. Write the mapping into the project’s design doc before generating a single line so the same NPC always uses the same voice ID across the entire script.

Open Speech Gen, click the voice picker dropdown, and audition each preset against a single line of placeholder dialogue (“The path through the mountains is closed by snow until spring” works as a neutral test line that exposes pace and timbre). Spend 5 credits total on the casting pass — one HD generation per voice you’re considering for a major role. Lock the cast, write the design-doc mapping, and move on.

Step 2 — Write dialogue lines with HD vs Turbo (8 emotions, 0.3 to 0.5 credits per 1K characters)

The Speech Gen dialogue workspace exposes two model tiers and an 8-emotion picker per line. HD costs 0.5 credits per 1000 characters and produces higher-fidelity TTS with cleaner consonants and more natural prosody. Turbo costs 0.3 credits per 1000 characters and produces faster, slightly thinner TTS that’s entirely shippable for background and ambient lines (verified at CREDITS_PER_1K_HD = 0.5 on line 28 and CREDITS_PER_1K_TURBO = 0.3 on line 29 of src/app/speech-gen/page.tsx on June 12, 2026; the minimum billing per generation is 1 credit per the MIN_TTS_CREDITS = 1 constant on line 30).

The two-pass pattern that ships every indie voice cast under budget: first pass on Turbo for the entire script so iteration cost stays at 0.3 credits per 1000 chars and the team can revise wording, swap voice assignments, and reorder scenes without burning credits. A 30-minute script at roughly 4500 characters of dialogue costs about 1.4 credits per voice on Turbo — effectively free. Second pass on HD for the keeper lines only — the tutorial monologue, the boss reveal, the ending cinematic, anywhere the player is paying full attention to the audio. The same 4500 characters on HD costs about 2.3 credits per voice. Re-rendering the top 20 percent of the script on HD adds about 5 to 10 credits per voice over the whole cast.

The emotion picker sits next to the voice picker and exposes 8 named emotions (Neutral, Happy, Calm, Sad, Angry, Fearful, Disgusted, Surprised, verified at the EMOTIONS array on lines 179 to 188). The emotion attaches per line, not per voice, so the same Imposing Manner king can deliver a Neutral opening greeting, an Angry threat in the middle scene, and a Sad concession line at the end of the boss fight. Pick the emotion that matches the dramatic intent before generating, not after — re-generating to correct a mismatched emotion costs the same 0.5 or 0.3 credits per 1000 chars as the original take.

The Speech Gen workspace also surfaces a project / batch system. Script the lines per character in a single text block, paste, set the voice ID, set the emotion, click Generate. The Gallery on the right of the workspace lists generated audio with waveform previews, a Play button, and download links. Audition every keeper line on headphones (not laptop speakers) and re-roll the takes that don’t land — AI-generated voice acting is iterative the same way human voice acting is iterative.

Two-column diagram of the 17 Sorceress Speech Gen preset voices — 9 male presets including Deep Voice Man and Imposing Manner, 8 female presets including Wise Woman and Lively Girl, plus the 8 per-line emotion controls
The full Sorceress Speech Gen preset library: 9 male and 8 female named character voices, plus 8 per-line emotion controls that attach to the line rather than the voice.

Step 3 — Clone the lead character’s voice (400 credits, 4:59 recording)

Voice cloning is the step that separates a preset-only voice cast from a cast with a consistent lead. The Speech Gen clone flow takes a single MP3 or M4A recording up to 4 minutes 59 seconds long and up to 20 MB in file size (verified at VOICE_CLONE_CREDITS = 400 on line 31, MAX_CLONE_DURATION = 299 on line 32, and MAX_CLONE_SIZE = 20 * 1024 * 1024 on line 33 of src/app/speech-gen/page.tsx on June 12, 2026) and returns a permanent voice ID at a flat 400-credit cost. Once the voice ID exists, every future line generated against it bills at the normal HD or Turbo per-character rate — no per-line clone surcharge, no expiration on the voice ID.

The recording instructions ship inside Speech Gen itself: a teleprompter script that runs about 7 minutes at a natural reading pace, designed to expose every phoneme the cloner needs. The minimum useful recording is 90 seconds of clean speech; the model latches onto the voice fingerprint faster than the cap suggests. Practical recording rules: record in a quiet room (a closet full of clothes is a perfectly acceptable booth for indie devs), maintain consistent distance from the microphone (8 to 12 inches), don’t edit out the natural breaths between sentences (they’re part of the voice fingerprint), and export at 128 kbps MP3 or M4A at the source sample rate. The 20 MB cap accommodates roughly 22 minutes of 128 kbps MP3 audio, well beyond what the cloner actually needs.

Upload the recording in the Voice Clones panel, name the voice (“Protagonist”, “The Player”, the character’s name — whatever the design doc uses), confirm the 400-credit charge, and wait for the status to move from processing to succeeded. The voice ID then appears in the same voice picker dropdown as the 17 presets and is selectable for every future TTS generation in the project. For a solo dev who voices their own protagonist by cloning their own voice, this is the one-time setup that lets the protagonist deliver hours of dialogue in the dev’s natural voice without the dev having to record every individual line.

The consent rule: cloning your own voice is unambiguously yours to deploy. Cloning a friend, a partner, or a hired voice actor requires written consent before the cloned voice ships in any commercial build — the same legal bar a traditional studio would have to clear for sampled dialogue. Sorceress doesn’t enforce this at the technical layer (the cloner accepts whatever audio you upload), so the responsibility sits with the project owner. Get the consent in writing before recording, store it in the project repo, and never clone a public figure’s voice from a YouTube rip.

Step 4 — Trim, fade, and master in SFX Editor before engine export

Raw TTS output isn’t shipping-ready audio. Speech Gen lines often have 100 to 300 milliseconds of silence at the head, a hard cutoff at the tail, and inconsistent loudness between takes. The Sorceress SFX Editor at /sfx-editor handles the trim, fade, and normalize pass that converts raw generations into engine-ready audio. The same browser-native editor that handles SFX one-shots and music loops handles dialogue lines without any format conversion.

The standard dialogue post-processing pass takes about 30 seconds per line. Trim the head silence. The first audible word should land within 50 milliseconds of the file start so the engine doesn’t play 300 ms of dead air before the dialogue begins. Drag the left handle to the start of the first phoneme. Trim the tail breath. If the take ends with an audible exhale or lip click, drag the right handle to the final consonant of the last word. Add a 30 to 50 ms fade-out. Hard cutoffs at the end of a line click audibly when the engine plays the file; a short fade smooths the transition into silence. Normalize to -3 dBFS. Different presets and different emotions render at noticeably different loudness levels; normalizing every dialogue file to the same peak level keeps the in-game mix balanced so the player isn’t scrambling for the volume knob between scenes.

For longer monologues (tutorial intros, cinematic narration), the SFX Editor also handles splice-and-stitch edits. If the take fumbles a word in the middle, regenerate just the fumbled sentence as a separate Speech Gen line, splice the new sentence into the original take at the appropriate timestamp, and crossfade across the splice point for 50 ms. The seam disappears.

Export the polished file as MP3 for web and mobile builds (smaller files, faster to download) or as WAV if the project is desktop-only and wants lossless audio. The Speech Gen native export is MP3; the SFX Editor handles WAV and OGG export for projects that prefer the open Vorbis format. Drop the exported files into the engine’s audio directory, write a naming convention (npc_merchant_greeting_01.mp3, npc_merchant_greeting_02.mp3) that the engine’s audio loader can index, and the cast is ready.

Pair the voice cast with Music Gen + SFX Gen for the full audio pipeline

An indie game with finished character voices, no music, and no SFX is still a tech demo. The Sorceress audio suite at /tools/audio-suite pairs Speech Gen with Music Gen for the soundtrack and SFX Gen for the hits and ambient audio, and the four tools share the same Gallery, project metadata, and SFX Editor post-processing pass.

The full indie audio budget for a 6-hour game with the cast described above: 460 credits for voices (400 clone + 60 dialogue), 140 credits for music (14 Music Gen runs at 10 credits each covering a title theme, 6 level loops, 4 boss tracks, 3 ambient pads), 60 credits for SFX (20 SFX Gen batches at 3 credits each covering hits, pickups, menu clicks, doors, footsteps). That’s 660 credits total, or about $6.60 at the standard $0.01 per credit. The Sorceress Lifetime tier at $49 (verified at LIFETIME_PRICE = 49 in src/app/plans/page.tsx) front-loads $50 of credits and never expires; the $20 Creator credit pack buys 2,000 credits, enough for three indie games’ worth of voices, music, and SFX. The 100 free starter credits at signup cover the entire voice-casting audition pass plus the first hour of dialogue work.

The wiring pattern for engine-side audio is the same regardless of source. Phaser loads dialogue, music, and SFX through the same this.load.audio() path and plays through the same this.sound.add() API. Three.js routes everything through the Web Audio API via AudioBufferSourceNode for spatial audio and GainNode for per-channel mixing. Native HTML5 routes the simpler cases through the HTMLMediaElement <audio> tag. The Sorceress output is always standard MP3, WAV, or OGG — no proprietary format, no engine lock-in, no per-tool integration to wire.

Side-by-side comparison of an indie voice cast made the studio path vs the Sorceress browser path — left lane shows hire actors, book studio time, $500 per role, weeks per cast; right lane shows pick presets, clone the lead, $5 full cast, minutes per line
The traditional studio path (left) versus the Sorceress browser path (right) for an indie voice cast: weeks of casting and booking, or an afternoon of preset picks plus a single 400-credit clone for the lead.

The verdict on the right AI voice generator character voices pipeline in 2026

The right AI voice generator character voices pipeline for an indie game in 2026 is the same four-step sequence every shipped indie audio pass follows — cast the roles from the 17 Sorceress Speech Gen presets, write dialogue against the Turbo model for iteration and the HD model for keeper lines, clone the lead at 400 credits once, then trim and master each line in SFX Editor before engine export. The four pieces (lead, party, recurring NPCs, throwaways) cover the entire cast of a 6-hour indie RPG. The 8 emotion controls cover every line in the script without the developer ever leaving the browser. The 460-credit total budget for voices puts a full game cast at under $5 in credits.

The trade is honest. AI-generated voice acting is a starting point that the developer’s ear has to finish — bad takes have to be re-rolled, emotion choices have to match the script’s intent, the trim-and-fade pass in the SFX Editor isn’t optional. The Speech Gen output isn’t a replacement for a unionized voice cast on a triple-A budget; it’s the replacement for the silent NPCs every indie ships with anyway. That trade favors the indie developer hard. The studio that would have spent $5,000 booking four actors, two recording sessions, and a sound editor now spends 5 hours casting, generating, and trimming the same cast, and ships a game with talking NPCs instead of a text-box-only fallback.

The next step is opening Speech Gen in a tab, picking a character from the project, auditioning three presets against the character’s defining line, and locking the assignment in the design doc. The 100 starter credits cover the casting audition plus the first thirty lines of dialogue. The SFX Editor at /sfx-editor handles the trim pass without leaving the browser. The full audio suite at Sorceress Sound Studio wraps Speech Gen, Music Gen, SFX Gen, and SFX Editor into a single workspace; the tool catalog at /tools-guide covers the rest of the asset stack (sprites, tilesets, 3D models, animation) that completes the game around the voice cast. For the sister pipeline-level treatment of NPC dialogue, the NPC voice pipeline post walks the wiring patterns; the video game music post covers the soundtrack side. The silent-NPC problem is no longer the blocker. The next blocker is writing dialogue worth voicing.

Frequently Asked Questions

What is the best AI voice generator for game character voices in 2026?

Sorceress Speech Gen is the right default for indie game character voices in 2026 because it bundles the three things an NPC cast actually needs in a single browser tool: a wide preset voice library (17 named voices split 9 male and 8 female, verified at the PRESET_VOICES array on lines 156 to 174 of src/app/speech-gen/page.tsx on June 12, 2026), per-line emotion control (8 emotions: Neutral, Happy, Calm, Sad, Angry, Fearful, Disgusted, Surprised, verified at the EMOTIONS array on lines 179 to 188), and one-tap voice cloning for the lead character (400 credits per clone, verified at VOICE_CLONE_CREDITS = 400 on line 31). The credit math at 0.5 credits per 1000 characters on HD and 0.3 on Turbo (lines 28 and 29) keeps a full RPG dialogue pass under 100 credits. The combined toolset replaces a $22 per month commercial TTS subscription stack at the cost of one Creator credit pack ($20 buys 2000 credits, enough for several indie projects worth of dialogue).

How much does AI voice cloning cost on Sorceress Speech Gen in 2026?

Voice cloning costs a flat 400 credits per clone on Sorceress Speech Gen (verified at VOICE_CLONE_CREDITS = 400 on line 31 of src/app/speech-gen/page.tsx on June 12, 2026). At the standard Sorceress credit price of $0.01 per credit, that is $4 per cloned voice ID. The clone is permanent: once the voice ID is generated, every future TTS line you write against that voice ID is billed at the same per-character TTS rate as the preset voices (0.5 credits per 1000 chars on HD, 0.3 on Turbo), with no per-line clone surcharge. The recording cap is 4 minutes 59 seconds (verified at MAX_CLONE_DURATION = 299 on line 32) and 20 MB total file size (MAX_CLONE_SIZE on line 33). For an indie game with a single lead protagonist who voices the tutorial, the loading-screen quips, and 30 minutes of in-game dialogue, the 400-credit one-time clone plus roughly 60 credits of TTS dialogue lands at under $5 in total credits.

Can AI-generated character voices be used commercially in an indie game?

Sorceress Speech Gen output is delivered to the generating user without watermarks and is intended for commercial use in the user's projects, including paid games on Steam and itch.io, jam submissions, and in-app dialogue audio shipped to end users. The honest framing for any AI-generated audio in 2026 is that copyright case law is still settling worldwide, so the prudent indie devo move is to keep proof of the prompt and the voice ID used for each generated line (the Speech Gen Gallery logs every prompt against the audio file) so the project can demonstrate the line is AI-generated rather than copied from a real voice actor's commercial recording. For voice clones built from your own recording, the rights are unambiguously yours; for voice clones built from a recording of another person, you need their written consent before deploying the clone in any shipped game. That consent step is the same compliance bar a traditional studio would have to clear for sampled dialogue.

How many voices can I cast in one indie game with Sorceress Speech Gen?

There is no cap on the number of distinct voice IDs an indie project can use in a single game on Sorceress Speech Gen. The 17 preset voices (9 male: Deep Voice Man, Casual Guy, Patient Man, Young Knight, Determined Man, Decent Boy, Imposing Manner, Elegant Man, Friendly Person; 8 female: Wise Woman, Calm Woman, Inspirational Girl, Lively Girl, Lovely Girl, Abbess, Sweet Girl, Exuberant Girl, all verified on lines 156 to 174 of src/app/speech-gen/page.tsx on June 12, 2026) cover the archetype range every NPC roster actually needs. A typical 6 to 8 hour indie RPG casts five to seven of the presets plus one cloned lead, which is the natural ceiling because beyond that the audience starts confusing similar voices. The right pattern is to assign one preset per role archetype (merchant: Friendly Person; king: Imposing Manner; healer: Wise Woman; rogue: Casual Guy; villain: Determined Man) and reserve the cloned voice for the protagonist whose voice the player hears the most.

What is the difference between the HD and Turbo models in Sorceress Speech Gen?

HD costs 0.5 credits per 1000 characters and produces higher fidelity, slower-to-render TTS suitable for the lead character's tutorial monologue and major story beats. Turbo costs 0.3 credits per 1000 characters and produces faster-to-render TTS that is good enough for ambient barks, generic NPC dialogue, and any line that plays in the background of gameplay (verified at CREDITS_PER_1K_HD = 0.5 and CREDITS_PER_1K_TURBO = 0.3 on lines 28 and 29 of src/app/speech-gen/page.tsx on June 12, 2026, with a 1-credit floor at MIN_TTS_CREDITS = 1 on line 30). The practical pattern: use Turbo for the first pass on every line during scripting and playtesting so iteration costs stay low, then re-render the keeper lines that survive the playtest in HD for the final build. Both models accept the same emotion conditioning (8 emotions on lines 179 to 188) and run against the same 17 preset voices and any cloned voice IDs the project owns.

How do I make an AI character voice sound angry, sad, or scared for an NPC line?

Sorceress Speech Gen exposes per-line emotion conditioning as a dropdown next to the voice picker, with 8 named emotions (Neutral, Happy, Calm, Sad, Angry, Fearful, Disgusted, Surprised, verified at the EMOTIONS array on lines 179 to 188 of src/app/speech-gen/page.tsx on June 12, 2026). Pick the emotion that matches the dramatic intent of the line and the model adjusts pitch contour, pace, and timbre to match. For an angry NPC threat line, select the Angry emotion plus a deep male voice like Imposing Manner or Determined Man. For a frightened villager, select Fearful plus a higher-pitched female voice like Lively Girl. The emotion choice applies per-line, not per-voice, so the same NPC can have a Neutral greeting line, a Happy reaction line when the player completes a quest, and a Sad farewell line at the story's end. For nuanced emotions outside the 8 presets (sarcastic, longing, smug), the right move is to script the line text itself with punctuation and word choice that suggests the tone, then audition the Neutral, Happy, and Calm presets and pick the closest take.

Can I clone my own voice to play every NPC in my game?

Yes, technically. The Sorceress Speech Gen voice-cloning flow accepts up to 4 minutes 59 seconds of clean MP3 or M4A audio at up to 20 MB total file size (verified at MAX_CLONE_DURATION = 299 and MAX_CLONE_SIZE = 20 * 1024 * 1024 on lines 32 and 33 of src/app/speech-gen/page.tsx on June 12, 2026) and returns a stable voice ID that can be referenced from any future TTS generation. Solo developers who voice their own NPCs by cloning themselves get a consistent acting performance across the entire cast and never pay another 400 credits after the initial clone. The narrative trade is that every NPC sounds variants of the same voice, which works in stylized projects (think every NPC voiced by the same indie auteur as a stylistic choice) but reads as cheap in projects that lean realistic. The cleaner pattern is to clone your own voice for the protagonist (who the player hears 60 percent of the time) and use the 17 preset voices for the rest of the cast, so the audience gets variety without you needing to act out the merchant, the king, and the healer in separate recording sessions.

How do I export AI-generated character voices into Phaser, Three.js, or any browser game?

Sorceress Speech Gen exports MP3, which loads natively into every browser-based game framework via the same audio loader path as music tracks and SFX. Phaser at https://phaser.io loads dialogue via this.load.audio('npc_merchant_greeting', 'merchant_greeting.mp3') in the scene preload, then plays with this.sound.add('npc_merchant_greeting').play() when the player interacts with the merchant NPC. Three.js loads via the Web Audio API audioLoader.load('npc_merchant_greeting.mp3', buffer => audio.setBuffer(buffer)). Native HTML5 plays the file via the standard audio element. For lip-sync, generate the audio file with Speech Gen, then run it through a separate viseme generator (the audio file format is standard and compatible with every viseme tool). For positional audio (the merchant's voice attenuates with distance from the player), use the Web Audio API PannerNode with the dialogue file as the source: the same per-line MP3 that plays as flat 2D dialogue in the UI plays as spatialized 3D audio when wired through a PannerNode, no re-export required.

Sources

  1. Speech synthesis - Wikipedia
  2. Voice cloning - Wikipedia
  3. Voice acting - Wikipedia
  4. Non-player character - Wikipedia
  5. MPEG-1 Audio Layer III - Wikipedia
  6. Vorbis - Wikipedia
  7. Phaser (game framework) - Wikipedia
  8. Web Audio API - MDN Web Docs
  9. HTMLMediaElement - MDN Web Docs
Written by Arron R.·3,291 words·15 min read

Related posts