Replacing Claude with Local LLMs: Gemma 4 & Qwen 3.6 on macOS
For many developers, Claude has been the gold standard for coding tasks due to its reasoning capabilities and large context window. However, the shift towards local LLMs is accelerating. With the release of models like Gemma 4 and Qwen 3.6 (35B A3B), it’s now possible to achieve near-frontier performance entirely on your macOS machine—without subscription fees or privacy concerns.
In this guide, I’ll show you how to set up these models using LM Studio and integrate them with OpenCode, while solving the most common “gotchas” that plague local setups.
The Local Powerhouse Duo
Gemma 4
Gemma 4 brings state-of-the-art reasoning and a deep understanding of modern programming paradigms. It is particularly effective at refactoring and architectural advice, making it a viable alternative to Claude’s high-end models for complex logic.
Qwen 3.6 (35B A3B)
The Qwen 3.6 series continues to dominate coding benchmarks. The 35B A3B model strikes a perfect balance between the agility of smaller models and the depth of massive ones, excelling at boilerplate generation and debugging across multiple languages.
Setup on macOS with LM Studio
- Install LM Studio: Download and install from lmstudio.ai.
- Load Your Models:
- Search for
gemma-4orqwen3.6-35b-a3b. - Choose a quantization that fits your RAM (Q4_K_M is generally the sweet spot for M-series Macs).
- Search for
- Configure the Local Server:
- Navigate to the Local Server tab (
<->icon). - Select your model and click Start Server.
- Note the base URL:
http://localhost:1234/v1.
- Navigate to the Local Server tab (
Integrating with OpenCode
To use these models as your primary coding agent, add them to your opencode.json (global or project-specific).
{
"$schema": "https://opencode.ai/config.json",
"provider": {
"lmstudio": {
"npm": "@ai-sdk/openai-compatible",
"name": "LM Studio Local",
"options": {
"baseURL": "http://127.0.0.1:1234/v1"
},
"models": {
"gemma-4": {
"name": "Gemma 4 (Local)",
"limit": {
"context": 32768,
"output": 8192
}
},
"qwen3.6-35b-a3b": {
"name": "Qwen 3.6 35B (Local)",
"limit": {
"context": 32768,
"output": 8192
}
}
}
}
}
}
Fixing Common Local LLM Errors
Running models locally can be unstable if the configuration doesn’t match the hardware capabilities. Here is how to fix the two most common errors:
1. “SSE read timed out”
This error occurs when the model takes too long to start generating a response (TTFT - Time To First Token), causing the client to drop the connection.
The Fixes:
- GPU Offloading: In LM Studio, ensure “GPU Offload” is set to Max. If your Mac has Unified Memory, this ensures the model stays in VRAM/RAM and doesn’t swap to disk.
- Quantization Check: If you see this frequently with a 35B model on 16GB or 24GB RAM, switch to a lower quantization (e.g., from Q6 to Q4_K_M).
- Server Stability: Restart the LM Studio server and ensure no other memory-intensive apps (like Chrome with 50 tabs) are competing for resources.
2. “Context size has been exceeded”
This happens when OpenCode sends a prompt that is larger than what the LM Studio server is configured to handle.
The Fixes: You must synchronize the context limits in two places:
- LM Studio Server Settings: In the Local Server tab, look for Context Length. Set this to your desired limit (e.g.,
32768). If you set it too high, you may experience OOM (Out of Memory) crashes. - OpenCode Config: Ensure the
limit.contextin youropencode.jsonmatches exactly what is set in LM Studio.
// In opencode.json
"limit": {
"context": 32768 // Must match LM Studio's Context Length setting
}
Local vs. Claude: The Verdict
| Feature | Claude (Frontier) | Gemma 4 / Qwen 3.6 (Local) |
|---|---|---|
| Privacy | Cloud-based | 100% Local |
| Cost | Monthly Sub/API Fee | Free |
| Latency | Network Dependent | Hardware Dependent |
| Context | Massive (200k+) | Limited by RAM (up to 32k-128k) |
| Control | Fixed Model | Full control over Quantization/Temp |
By combining the reasoning power of Gemma 4 and Qwen 3.6 with the agentic capabilities of OpenCode, you can build a world-class development environment that respects your privacy and your wallet.