Run Claude Code with a Free Local Model — Qwen 3.5 + Ollama Setup

Watch the Full Setup
Claude Code is powerful, but it costs money. Every prompt burns API tokens, and your code is sent to external servers. What if you could run the same Claude Code workflow but powered by a free local model on your own machine?
Meet Qwen 3.5
Qwen 3.5 is a 27 billion parameter model distilled from Claude 4.6 Opus reasoning traces. The benchmarks are impressive:
- Beats Claude Sonnet 4.5 on SWE-bench
- Retains 96.91% HumanEval accuracy
- Slashes chain-of-thought bloat by 24% (faster, more focused responses)
- Fits on a single GPU with just 16GB of VRAM
- Over 300,000 downloads on Hugging Face
And thanks to Ollama, it plugs directly into Claude Code.
What You Need
- A GPU with at least 16GB VRAM (even a mid-range card works)
- Ollama (to run the model)
- Claude Code (the coding agent)
Setup Steps
1. Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
2. Install Claude Code
Follow the standard Claude Code installation, then export the path to your bashrc.
3. Pull the Qwen 3.5 Model
ollama pull qwen3.5
This downloads a 4-bit quantized version that fits in 16GB of VRAM while keeping 97% of the original accuracy.
4. Launch Claude Code with the Local Model
claude --model ollama:qwen3.5
Ollama has built-in support for Claude Code. One command and you're coding with local AI. Same workflow as Claude, but running on your GPU.
What It Can Do
Test 1: Writing Code
I asked it to write an async Python function. It produced clean code with proper error handling, type hints, and saved the file automatically. Not just a chatbot -- it's an agent that writes and manages files.
Test 2: Bug Fixing
I gave it broken code for merging sorted arrays. It read the file, found the bug, explained why it fails, and fixed it in place. All locally, no API call.
Test 3: Building a Landing Page
I asked it to create a modern landing page with Tailwind CSS -- hero section, features grid, footer, dark theme, responsive. It created a complete, professional-looking page. Built by Claude Code running on a local model.
Why This Matters
- Free forever -- no API costs, no rate limits
- Private -- your code never leaves your machine
- Same workflow -- identical Claude Code experience, just a different engine
- Opus-level reasoning -- distilled from Claude 4.6 Opus traces
Qwen 3.5 + Ollama + Claude Code = full agentic AI coding with Opus-level reasoning, running locally on a single GPU, free forever.