Replacing Claude with Local LLMs: Gemma 4 & Qwen 3.6 on macOS

| 4 min read

For many developers, Claude has been the gold standard for coding tasks due to its reasoning capabilities and large context window. However, the shift towards local LLMs is accelerating. With the release of models like Gemma 4 and Qwen 3.6 (35B A3B), it’s now possible to achieve near-frontier performance entirely on your macOS machine—without subscription fees or privacy concerns.

In this guide, I’ll show you how to set up these models using LM Studio and integrate them with OpenCode, while solving the most common “gotchas” that plague local setups.

The Local Powerhouse Duo

Gemma 4

Gemma 4 brings state-of-the-art reasoning and a deep understanding of modern programming paradigms. It is particularly effective at refactoring and architectural advice, making it a viable alternative to Claude’s high-end models for complex logic.

Qwen 3.6 (35B A3B)

The Qwen 3.6 series continues to dominate coding benchmarks. The 35B A3B model strikes a perfect balance between the agility of smaller models and the depth of massive ones, excelling at boilerplate generation and debugging across multiple languages.

Setup on macOS with LM Studio

  1. Install LM Studio: Download and install from lmstudio.ai.
  2. Load Your Models:
    • Search for gemma-4 or qwen3.6-35b-a3b.
    • Choose a quantization that fits your RAM (Q4_K_M is generally the sweet spot for M-series Macs).
  3. Configure the Local Server:
    • Navigate to the Local Server tab (<-> icon).
    • Select your model and click Start Server.
    • Note the base URL: http://localhost:1234/v1.

Integrating with OpenCode

To use these models as your primary coding agent, add them to your opencode.json (global or project-specific).

{
  "$schema": "https://opencode.ai/config.json",
  "provider": {
    "lmstudio": {
      "npm": "@ai-sdk/openai-compatible",
      "name": "LM Studio Local",
      "options": {
        "baseURL": "http://127.0.0.1:1234/v1"
      },
      "models": {
        "gemma-4": {
          "name": "Gemma 4 (Local)",
          "limit": {
            "context": 32768,
            "output": 8192
          }
        },
        "qwen3.6-35b-a3b": {
          "name": "Qwen 3.6 35B (Local)",
          "limit": {
            "context": 32768,
            "output": 8192
          }
        }
      }
    }
  }
}

Fixing Common Local LLM Errors

Running models locally can be unstable if the configuration doesn’t match the hardware capabilities. Here is how to fix the two most common errors:

1. “SSE read timed out”

This error occurs when the model takes too long to start generating a response (TTFT - Time To First Token), causing the client to drop the connection.

The Fixes:

  • GPU Offloading: In LM Studio, ensure “GPU Offload” is set to Max. If your Mac has Unified Memory, this ensures the model stays in VRAM/RAM and doesn’t swap to disk.
  • Quantization Check: If you see this frequently with a 35B model on 16GB or 24GB RAM, switch to a lower quantization (e.g., from Q6 to Q4_K_M).
  • Server Stability: Restart the LM Studio server and ensure no other memory-intensive apps (like Chrome with 50 tabs) are competing for resources.

2. “Context size has been exceeded”

This happens when OpenCode sends a prompt that is larger than what the LM Studio server is configured to handle.

The Fixes: You must synchronize the context limits in two places:

  1. LM Studio Server Settings: In the Local Server tab, look for Context Length. Set this to your desired limit (e.g., 32768). If you set it too high, you may experience OOM (Out of Memory) crashes.
  2. OpenCode Config: Ensure the limit.context in your opencode.json matches exactly what is set in LM Studio.
// In opencode.json
"limit": {
  "context": 32768 // Must match LM Studio's Context Length setting
}

Local vs. Claude: The Verdict

FeatureClaude (Frontier)Gemma 4 / Qwen 3.6 (Local)
PrivacyCloud-based100% Local
CostMonthly Sub/API FeeFree
LatencyNetwork DependentHardware Dependent
ContextMassive (200k+)Limited by RAM (up to 32k-128k)
ControlFixed ModelFull control over Quantization/Temp

By combining the reasoning power of Gemma 4 and Qwen 3.6 with the agentic capabilities of OpenCode, you can build a world-class development environment that respects your privacy and your wallet.