Back to LearnGuide

Local AI Setup: From Zero to Production

Running AI agents locally gives you zero API costs, complete privacy, and offline capability. This guide walks you through the entire setup process — from picking hardware to running your first local agent in production.

Step 1: Hardware Selection

Your GPU determines which models you can run and how fast. Here are realistic tiers for 2026:

TierGPUModelsCost

EntryGTX 1060 6GB1-4B, Q4_K_M~$100 used

Mid-rangeRTX 3060 12GB4-8B, Q4_K_M~$250 used

ProfessionalRTX 4090 24GB8-32B, Q4_K_M~$1,600 new

ServerA100 80GB70B+, fp16~$10,000+

For most developers, an RTX 3060 12GB hits the sweet spot: runs 7-8B models comfortably in Q4_K_M quantization with room for context processing. A GTX 1060 6GB is the budget champion — handles 4B models with adapters perfectly for code review, documentation, and audit tasks.

Step 2: Install Ollama

Ollama is the standard tool for running LLMs locally. It handles model downloading, quantization, GPU acceleration, and provides an OpenAI-compatible API.

# Linux / WSL2

curl -fsSL https://ollama.com/install.sh | sh

# Verify GPU detection

ollama run llama3.2:1b --verbose

If Ollama detects your GPU, you will see CUDA being used in the verbose output. If not, ensure your NVIDIA drivers are up to date (version 525+ for CUDA 12 support).

Step 3: Pull and Test Models

Start with a small model to verify your setup, then move to a larger one:

# Quick test (1B, ~700MB)

ollama pull llama3.2:1b

# Daily driver coding model (4B, ~2.5GB)

ollama pull qwen3:4b

# Heavy lifter (8B, ~5GB if available VRAM)

ollama pull llama3.1:8b

Step 4: Configure OpenClaw for Local Models

OpenClaw connects to Ollama automatically. Add this to your OpenClaw config:

{ "providers": { "ollama": { "baseUrl": "http://localhost:11434/v1", "apiKey": "ollama" } }, "defaultModel": "ollama/qwen3:4b" }

Now OpenClaw will use your local Qwen3-4B model for all agent tasks. No API keys, no internet required.

Step 5: Run Your First Local Agent

With Ollama running and OpenClaw configured, you are ready to run agents locally:

Browse the FlickClaw agent catalog and select an agent.
Choose “Ollama” as the export format.
Drop the generated agent files into your OpenClaw workspace.
The agent runs against your local model with zero configuration needed.

Performance tip: For local models, use agents with quality gates enabled. Quality gates are deterministic (no model involvement), so they add reliability without adding latency.

Troubleshooting Common Issues

Ollama does not detect GPU

Ensure nvidia-smi works in your terminal. On WSL2, install the NVIDIA CUDA WSL2 package. On Linux, verify nvidia-container-toolkit is installed.

Out of memory errors

Reduce context size. Most models default to 2048 tokens. For code review, 4096 is practical. For full-repository analysis, 8192+ may be needed — but you will need more VRAM.

Slow inference (<10 tokens/sec)

Ensure GPU acceleration is active. Check that the model fits entirely in VRAM — if it spills to system RAM, performance drops 10-50x.

Back to Learn