Run Claude Code with a Free Local Model — Qwen 3.5 + Ollama Setup

Watch the Full Setup

Claude Code is powerful, but it costs money. Every prompt burns API tokens, and your code is sent to external servers. What if you could run the same Claude Code workflow but powered by a free local model on your own machine?

Meet Qwen 3.5

Qwen 3.5 is a 27 billion parameter model distilled from Claude 4.6 Opus reasoning traces. The benchmarks are impressive:

Beats Claude Sonnet 4.5 on SWE-bench
Retains 96.91% HumanEval accuracy
Slashes chain-of-thought bloat by 24% (faster, more focused responses)
Fits on a single GPU with just 16GB of VRAM
Over 300,000 downloads on Hugging Face

And thanks to Ollama, it plugs directly into Claude Code.

What You Need

A GPU with at least 16GB VRAM (even a mid-range card works)
Ollama (to run the model)
Claude Code (the coding agent)

Setup Steps

1. Install Ollama

curl -fsSL https://ollama.com/install.sh | sh

2. Install Claude Code

Follow the standard Claude Code installation, then export the path to your bashrc.

3. Pull the Qwen 3.5 Model

ollama pull qwen3.5

This downloads a 4-bit quantized version that fits in 16GB of VRAM while keeping 97% of the original accuracy.

4. Launch Claude Code with the Local Model

claude --model ollama:qwen3.5

Ollama has built-in support for Claude Code. One command and you're coding with local AI. Same workflow as Claude, but running on your GPU.

What It Can Do

Test 1: Writing Code

I asked it to write an async Python function. It produced clean code with proper error handling, type hints, and saved the file automatically. Not just a chatbot -- it's an agent that writes and manages files.

Test 2: Bug Fixing

I gave it broken code for merging sorted arrays. It read the file, found the bug, explained why it fails, and fixed it in place. All locally, no API call.

Test 3: Building a Landing Page

I asked it to create a modern landing page with Tailwind CSS -- hero section, features grid, footer, dark theme, responsive. It created a complete, professional-looking page. Built by Claude Code running on a local model.

Why This Matters

Free forever -- no API costs, no rate limits
Private -- your code never leaves your machine
Same workflow -- identical Claude Code experience, just a different engine
Opus-level reasoning -- distilled from Claude 4.6 Opus traces

Qwen 3.5 + Ollama + Claude Code = full agentic AI coding with Opus-level reasoning, running locally on a single GPU, free forever.