⚡ AI Providers & Local Models | WizardGenie Wiki | Sorceress

Open WizardGenie

WizardGenie can use several kinds of AI backends: API-key providers, subscription-backed coding agents, Cursor-powered agents, local models through Ollama or LM Studio, Ollama Cloud models, and self-hosted compatible servers. This page explains what each option is for, how to set it up, and how to troubleshoot model visibility or runtime problems.

What AI providers do in WizardGenie

The selected model powers the WizardGenie chat agent. Depending on the backend and model, it can inspect your project, write and edit files, run supported tools, work with image attachments, and use game-specific integrations when available.

WizardGenie supports these broad model types:

API-key models — connect a provider account with an API key.
Subscription models — use supported desktop coding-agent subscriptions instead of a direct per-token API key.
Cursor models — route turns through Cursor’s local agent support using your Cursor API key.
Local models — run models on your own machine through Ollama or LM Studio.
Ollama Cloud models — use Ollama-hosted models without downloading weights locally.
Self-hosted compatible models — connect WizardGenie to your own compatible chat server.

Not every model supports every feature. In the provider panel and model picker, look for visible labels such as vision support, cloud/local status, context size, VRAM guidance, reasoning/thinking support, download state, and whether a local runtime is running.

Where to configure providers

Most setup happens in the Providers & Agents or settings area, while day-to-day model selection happens in the chat model picker.

Typical setup flow:

Open WizardGenie.
Open the provider/settings panel.
Add the needed API key, connect a subscription, or install/start a local runtime.
Use Refresh, Recheck, or reopen the model picker if the list does not update immediately.
Select a visible, ready model in chat.

For local and cloud catalog entries, WizardGenie lets you hide or show individual rows. Hiding a row removes it from the chat picker without uninstalling or deleting anything.

API-key providers

API-key providers are the standard way to use hosted models. After you add a key, WizardGenie sends chat turns to the selected provider when you choose one of its models.

Supported provider families shown in the current app include:

Anthropic
OpenAI
Gemini
DeepSeek
xAI
MiniMax
Moonshot
OpenRouter
opencode / Zen
Kilo Gateway
Z.ai GLM
Z.ai GLM Coding Plan

Some provider families have custom handling for tool use, vision, reasoning, or prompt caching. Some gateway-style providers can list models dynamically from your account, then WizardGenie filters the list to show models that are suitable for interactive tool-using chat.

Add an API key

Open the provider/settings panel.
Find the provider you want to use.
Paste your API key into that provider’s key field.
Save or let the panel apply the setting.
Use Refresh, Recheck, or reopen the model picker if the model list does not update.
Select a model from that provider in chat.

Dynamic model lists

Some gateways expose a model list through your account. WizardGenie only shows models that appear usable for normal chat and project editing. It filters out model IDs that look like image-generation-only, audio/music, moderation or safety classifiers, deep research, OCR, speech recognition, video, or code-apply-only models.

If a dynamic provider returns only limited metadata, WizardGenie may show models with incomplete context, pricing, or capability information. Before relying on one for project edits, confirm that the model supports streaming chat and tool/function calling.

Vision support

Models marked as vision-capable can receive image attachments. Text-only providers or models may hide vision rows, or WizardGenie may remove image blocks and add a note so the turn can continue as text.

DeepSeek’s currently wired chat models are treated as text-only in WizardGenie. If you attach images while using a text-only backend, choose a vision-capable model instead.

Subscription-based coding agents

WizardGenie can route coding turns through supported local subscription-backed coding agents. These are intended for users who already use a supported coding subscription and want that agent inside WizardGenie.

Subscription-backed options require the desktop app, because WizardGenie must communicate with a local command-line agent on your machine. They are hidden in web/headless deployments.

Codex / ChatGPT plan option

The Codex subscription option appears in the model picker under subscription models on desktop when available.

What it supports:

Browser-based sign-in through the provider’s own login flow.
One ongoing coding-agent conversation per WizardGenie chat.
Streaming assistant text into WizardGenie.
Tool activity shown in the chat UI, including command execution, file changes, dynamic tool calls, web search, image generation, and connected external tools when the underlying agent reports them.
Project file edits inside the opened project workspace.
Stop/cancel for in-progress turns.
Account and rate-limit status updates when available.
Image attachments when supported by the selected agent/model.
Game/project integrations for supported project types, including available Roblox and Unreal connections when the active project supports them.

How to use it:

Install the required local coding-agent command-line tool if WizardGenie reports it is missing.
Make sure the command works from your system environment.
Open the WizardGenie model picker.
Click the Codex / ChatGPT-plan model row.
Complete the browser sign-in flow.
Open or create a project.
Select the subscription model and start chatting.

If the row reports that the local tool is missing, use the install guidance shown in WizardGenie or install it manually using the provider’s official instructions. WizardGenie can also show an update hint when your installed local agent is older than the recommended version.

Claude subscription option

The Claude subscription option appears in the model picker under subscription models on desktop when available.

What it supports:

Browser-based sign-in with a Claude subscription account.
Reused sessions per WizardGenie chat.
Streaming assistant text.
Tool start/result display.
Image attachments.
Optional model aliases in the picker such as Opus, Sonnet, or Haiku when supported by the installed local adapter.
Stop/cancel for active turns.
Project file edits inside the opened workspace.
Connected external tools, including built-in game/project integrations and user-configured servers when available.

How to use it:

Open the WizardGenie model picker.
Click the Claude Code (Claude plan) row.
If prompted, install the required local components.
Complete the browser sign-in.
Return to WizardGenie and select the subscription model.

If WizardGenie says Claude Code is not signed in, open the model picker and click the Claude subscription row again to reconnect. WizardGenie also tries to avoid false sign-outs from temporary account-check failures, but a real logout still requires reconnecting.

Subscription model limits

Subscription-backed agents use their own harness and account limits. WizardGenie does not treat them exactly like direct API models:

Some WizardGenie-native planner/executor or provider-specific features may not apply.
Tool behavior is controlled partly by the subscription agent’s own environment.
Local command-line tools should be kept updated.
Desktop is required.

Cursor-powered agents

WizardGenie can run chat turns through Cursor’s local agent support when a Cursor API key is configured.

What it supports:

Cursor model selection from your available Cursor models.
A preferred model shortlist based on your Cursor IDE preferences when available.
A fallback preferred shortlist if Cursor IDE preferences cannot be read.
Model variants shown with badges such as Thinking, Fast, Low, Medium, High, Extra High, or Max when Cursor exposes those options.
Image attachments for models marked as vision-capable.
Persistent agent sessions per WizardGenie chat when possible.
Stop/cancel for in-progress runs.
Tool call and tool result display.
Connected external tools, including Pixel Lab, Sorceress tool access, WizardGenie management tools, supported Roblox and Unreal project integrations, and enabled user-configured servers.

How to use it:

Get a Cursor API key from your Cursor account.
Open WizardGenie settings.
Add the key in Cursor API Key.
Refresh the model list if needed.
Select a Cursor model in the chat picker.
Open or create a project before sending a Cursor-backed turn.

Troubleshooting:

“Cursor API key not configured” — add the key in settings.
Invalid key — generate a new key and paste it again.
Model-list rate limited — wait a minute or click refresh later. Chat may still work if the key is valid.
Cursor needs an open project — open or create a project before using Cursor models.

Local models with Ollama

Ollama support lets WizardGenie run local models on your own machine. This is useful when you want local execution, privacy, or to keep working without choosing a hosted API model after the model has been downloaded.

WizardGenie only installs or starts Ollama when you explicitly click the related controls. It does not silently install the runtime.

Ollama runtime status

The Ollama manager shows:

Running with a version number when the local runtime is available.
Stopped when Ollama appears installed but is not running.
Not installed when no local Ollama install is found.
Detected GPU name and total VRAM when available.
Currently loaded models, with a Free GPU control to unload models from GPU/RAM.

The VRAM value is total GPU memory, not currently free memory. On some systems, especially fallback GPU detection, the value may be shown as an estimate.

Install or start Ollama

Open the Providers & Agents panel.
Expand the Ollama / local models area.
If Ollama is missing, click Install Ollama.
If Ollama is installed but stopped, click Start Ollama.
Click Recheck after installation or startup.

If the built-in install fails, use the Download page button. When shown, Copy command copies the install command so you can run it manually in a terminal.

Download a local model

Make sure Ollama is running.
In Downloadable models, find the model you want.
Check the disk size and recommended VRAM shown on the row.
Click Download.
Watch the progress status and percentage.
Use Cancel if you need to stop the download.
When the row says Installed / Ready, select it in the chat picker.

WizardGenie marks models as Low VRAM or Tight when your detected GPU memory may be below the model’s minimum or recommended target. These labels are guidance; actual performance depends on your system, drivers, current GPU load, and whether the model can partially fit in system memory.

Local Ollama catalog

The current local Ollama catalog includes:

Gemma 4 variants, including E2B, E4B, a lighter E4B quantized option, 26B, and multiple 31B quantized variants.
Qwen 3.6 27B and 35B.
Nemotron 3 33B.
Granite 4.1 3B, 8B, and 30B.

Rows may show context size, vision support, thinking/reasoning support, approximate disk size, minimum VRAM, recommended VRAM, quantization, and coding-score guidance.

Gemma 4 version requirement

Some Gemma 4 tags require a newer Ollama build. If WizardGenie shows a warning that Gemma 4 needs a newer Ollama version, update Ollama from the tray app or the official download page, then click Recheck.

Free GPU / unload model

Local models can remain loaded for faster follow-up turns. If you need VRAM back for a game, editor, renderer, or another model:

Open the Ollama manager.
Look for the Loaded: status row.
Click Free GPU.

This unloads the active Ollama model from GPU/RAM. The next use may take longer because the model must load again.

Ollama Cloud models

Ollama Cloud entries appear separately from downloadable local models. They run server-side through Ollama rather than downloading weights to your machine.

The current Ollama Cloud catalog includes:

Gemma 4 31B BF16 (Cloud)
Qwen3 Coder Next (Cloud)
Qwen 3.5 Flagship (Cloud)
Nemotron 3 Nano 30B (Cloud)

How to use them:

Make sure you are signed in to Ollama as required by your Ollama account.
In WizardGenie, show the desired Ollama Cloud model if it is hidden.
Select it in the chat picker.

Cloud rows do not need local model downloads and do not require local GPU VRAM, but they depend on your Ollama account and cloud availability.

LM Studio local server

WizardGenie can connect to an LM Studio local server using its compatible local server mode.

Use it when you want to run a model managed by LM Studio instead of Ollama.

General setup:

Start LM Studio.
Load a chat-capable model.
Start LM Studio’s local server.
In WizardGenie, configure or enable the LM Studio provider.
Refresh dynamic models if needed.
Select an LM Studio model in the chat picker.

WizardGenie treats LM Studio as a local compatible provider. Model capability depends on the model you load and the local server behavior. LM Studio rows may appear with limited metadata because the local server often reports only model IDs.

Self-hosted compatible endpoint

WizardGenie includes a custom/self-hosted compatible endpoint option for users running their own chat server. This is useful for rented GPU instances, local inference servers, or other compatible deployments.

What to configure:

Endpoint URL.
API key if your server requires one.
A model ID supported by your server.

Important behavior:

WizardGenie uses text-based tool calling for this backend, which helps with models that do not reliably emit structured tool calls.
Vision depends on your server and model.
Long-running streams have stall protection for local/self-hosted-style streams.
If the server outputs malformed tool JSON, WizardGenie attempts to repair common formatting mistakes, but no model is guaranteed to be reliable.

Token analytics

WizardGenie includes a Token Analytics panel with:

Overview totals.
Per Chat breakdown.
Live usage feed with pause/resume.
Time range selector for Today, 7 days, or 30 days.
Input/output token counts.
Cache read/write accounting for providers that report cache tokens.
Local-call indicators.
Tool-call counts.
Latency.

Local models and subscription-backed models may show zero or incomplete direct API cost because they do not behave like normal per-token API calls.

Common problems and fixes

A model does not appear in the picker

Try the following:

Confirm the provider is configured or signed in.
Click refresh/recheck in the provider panel.
Make sure the model is not hidden.
For local Ollama models, confirm the model is downloaded.
For Ollama Cloud, confirm you are signed in to Ollama if required.
For dynamic gateways, confirm your key can list models.
For desktop subscription models, confirm the required local command-line tool is installed and available.
For Cursor, confirm the API key is valid and wait if the model-list check is rate limited.

Local model says Ollama is stopped

Click Start Ollama, then Recheck. If it still does not respond, open Ollama manually from your OS menu or tray and try again.

Local model download fails

Common causes:

Ollama is not running.
Disk is full.
The model tag requires a newer Ollama version.
The model is gated and requires signing in to Ollama.
Network connection failed.
Your installed Ollama version cannot find a newer catalog tag.

Update Ollama, sign in if needed, free disk space, then retry.

Local model is very slow on the first reply

A local model may need to load into GPU/RAM before the first token. Large models can take a long time, especially on lower-VRAM systems. Follow-up turns are faster while the model remains loaded.

Chat stalls with a local/self-hosted model

WizardGenie has stall detection for local/self-hosted-style streams. If a connection drops or stops producing tokens, retry the message. If it happens repeatedly, restart the local runtime or choose a smaller model.

Image attachments are ignored

The selected model or provider may be text-only. Choose a model marked as vision-capable if you want it to inspect images.

A gateway free model is blocked

Some free gateway models require enabling the provider’s prompt-logging or data-policy setting on that provider account. Paid models may not have the same requirement. Adjust the account privacy setting on that provider, then retry.

Subscription model says not signed in

Open the model picker and click the subscription model row again to start or refresh the sign-in flow. If a local CLI update is suggested, update it and retry.

Subscription agent can run commands but cannot edit files

Update the local subscription agent tool. WizardGenie sets up the active project as the writable workspace, but older local agent versions may have bugs around file edits or approval handling.

Tips

Use API-key frontier models for the most reliable tool calling and complex edits.
Use subscription agents if you already pay for that ecosystem and want coding-agent behavior inside WizardGenie.
Use Cursor models if your Cursor workflow and preferred model variants are already set up.
Use Ollama for local execution and privacy, but choose a model that fits your VRAM.
Use Free GPU after local-model work if you need VRAM for game engines or rendering.
Prefer smaller local models for quick iteration and larger local/cloud models for complex architecture or debugging.
Keep local subscription tools and Ollama updated; many failures are fixed by newer runtime versions.
If you attach images, double-check that the selected row is marked vision-capable.