FlickClaw
    AgentsPacksDownloadBlogLearnPricingDocsFAQ
    Sign In
    Back to LearnGuide

    Local AI Setup: From Zero to Production

    Running AI agents locally gives you zero API costs, complete privacy, and offline capability. This guide walks you through the entire setup process — from picking hardware to running your first local agent in production.

    Step 1: Hardware Selection

    Your GPU determines which models you can run and how fast. Here are realistic tiers for 2026:

    TierGPUModelsCost
    EntryGTX 1060 6GB1-4B, Q4_K_M~$100 used
    Mid-rangeRTX 3060 12GB4-8B, Q4_K_M~$250 used
    ProfessionalRTX 4090 24GB8-32B, Q4_K_M~$1,600 new
    ServerA100 80GB70B+, fp16~$10,000+

    For most developers, an RTX 3060 12GB hits the sweet spot: runs 7-8B models comfortably in Q4_K_M quantization with room for context processing. A GTX 1060 6GB is the budget champion — handles 4B models with adapters perfectly for code review, documentation, and audit tasks.

    Step 2: Install Ollama

    Ollama is the standard tool for running LLMs locally. It handles model downloading, quantization, GPU acceleration, and provides an OpenAI-compatible API.

    # Linux / WSL2
    curl -fsSL https://ollama.com/install.sh | sh
    # Verify GPU detection
    ollama run llama3.2:1b --verbose

    If Ollama detects your GPU, you will see CUDA being used in the verbose output. If not, ensure your NVIDIA drivers are up to date (version 525+ for CUDA 12 support).

    Step 3: Pull and Test Models

    Start with a small model to verify your setup, then move to a larger one:

    # Quick test (1B, ~700MB)
    ollama pull llama3.2:1b
    # Daily driver coding model (4B, ~2.5GB)
    ollama pull qwen3:4b
    # Heavy lifter (8B, ~5GB if available VRAM)
    ollama pull llama3.1:8b

    Step 4: Configure OpenClaw for Local Models

    OpenClaw connects to Ollama automatically. Add this to your OpenClaw config:

    { "providers": { "ollama": { "baseUrl": "http://localhost:11434/v1", "apiKey": "ollama" } }, "defaultModel": "ollama/qwen3:4b" }

    Now OpenClaw will use your local Qwen3-4B model for all agent tasks. No API keys, no internet required.

    Step 5: Run Your First Local Agent

    With Ollama running and OpenClaw configured, you are ready to run agents locally:

    1. Browse the FlickClaw agent catalog and select an agent.
    2. Choose “Ollama” as the export format.
    3. Drop the generated agent files into your OpenClaw workspace.
    4. The agent runs against your local model with zero configuration needed.

    Performance tip: For local models, use agents with quality gates enabled. Quality gates are deterministic (no model involvement), so they add reliability without adding latency.

    Troubleshooting Common Issues

    Ollama does not detect GPU

    Ensure nvidia-smi works in your terminal. On WSL2, install the NVIDIA CUDA WSL2 package. On Linux, verify nvidia-container-toolkit is installed.

    Out of memory errors

    Reduce context size. Most models default to 2048 tokens. For code review, 4096 is practical. For full-repository analysis, 8192+ may be needed — but you will need more VRAM.

    Slow inference (<10 tokens/sec)

    Ensure GPU acceleration is active. Check that the model fits entirely in VRAM — if it spills to system RAM, performance drops 10-50x.

    Back to Learn
    FlickClaw

    AI Agent Launcher for serious builders. Browse, export, run.

    Social

    Product

    • Agents
    • Packs
    • Pricing
    • Quality
    • Docs
    • Changelog

    Resources

    • FAQ
    • Status
    • Download
    • About
    • Contact
    • Sitemap

    Legal

    • Privacy Policy
    • Terms of Service
    • Refund Policy
    • Cookie Policy
    • AI Agents
    • No Token Fees

    FlickClaw © 2026. AI Agent Launcher platform.

    v0.6.41
    HTTPSTLS 1.3 encryptedSecureCSP · HSTS · X-FrameGDPREU compliant
    PrivacyTermsRefund Policy