machine learning

The Best Local Coding LLMs You Can Run Yourself

September 22, 2025

Image by Editor | ChatGPT

We are living in an era where large language models (LLMs) dominate and influence the way we work. Even local LLMs that are fine-tuned for coding have become increasingly effective, allowing developers and data professionals to use them as personal coding assistants in their own environments. This approach is often preferable, since these models can enhance data privacy and decrease API costs.

These local coding LLMs now have a variety of applications that weren’t practical before, as they bring hands-on AI assistance directly into the developer workflow. This, in turn, enables inline autocompletion, code debugging, and even reasoning across projects. There are many ways to run an LLM locally if you are interested, so check them out.

Even for non-developers or people without technical backgrounds, a new trend called vibe coding has emerged in the local scene because of local coding LLMs, which you can try to master yourself. For data scientists, you can also take a look at a few projects you could build with vibe coding.

As local coding LLMs become more prominent, it’s helpful to know which options you can run yourself. In this article, we explore some of the best local coding LLMs that fit into local workflows and highlight why they stand out from the rest.

# 1. GLM-4-32B-0414

Tsinghua University’s Zhipu AI recently introduced a new open-source model series called GLM-4-32B-0414, a 32-billion-parameter model comparable to GPT-4o and DeepSeek-V3. The model has been extensively pretrained on 15T reasoning-heavy data, refined through human preference alignment, rejection sampling, and reinforcement learning. This helps the model follow instructions and produce well-structured outputs.

The model excels at handling complex code generation, code analysis, and function-call–style outputs. Thanks to its training, it can perform multi-step reasoning in code—such as tracing logic or suggesting improvements—better than many models of similar or larger size. Another advantage is its relatively large context window, up to 32k tokens, allowing GLM-4 to process large chunks of code or multiple files without issues. This makes it useful for tasks like analyzing entire codebases or providing comprehensive refactoring suggestions in a single run.

# 2. DeepSeekCoder V2

DeepSeekCoder V2 is a coding LLM based on a mixture-of-experts system trained specifically for coding work. The models are released in two open-weight variants: a 16B “Lite” model and a 236B model. The DeepSeekCoder V2 model was pre-trained with 6T additional data on top of DeepSeek-V2 and expands language coverage from 86 to 338 programming languages. The context window also extends to 128k tokens, which is useful for whole-project comprehension, code infilling, and cross-file refactors.

Performance-wise, the model shows top-tier results, as demonstrated by a strong Aider LLM leaderboard score, placing it alongside premium closed models for code reasoning. The code is MIT-licensed, and the model weights are available under DeepSeek’s model license, which permits commercial use. Many run the 16B Lite locally for fast code completion and vibe-coding sessions, while the 236B is aimed at multi-GPU servers for heavy code generation and project-scale reasoning.

# 3. Qwen3-Coder

Qwen3-Coder is a code-focused LLM developed by Alibaba Cloud’s Qwen team that was trained on 7.5T data, 70% of which was code. It uses a mixture-of-experts (MoE) transformer with two versions: 35B and 480B parameters. Its performance rivals GPT-4-level and Claude 4 Sonnet coding capabilities and brings a 256k context window (extendable to 1M via Yarm). This allows the model to handle entire repositories and long files in a single session. It also understands and generates code in over 350 programming languages while boasting capability for agentic coding tasks.

The 480B model demands heavy hardware such as multi-H100 GPUs or high-memory servers, but its MoE design means only a subset of parameters is active per token. If you want smaller requirements, the 35B and FP8 variants can run on a single high-end GPU for local usage. The model’s weights are openly available under the Apache 2.0 license, making Qwen3-Coder a powerful yet accessible coding assistant—from foundational coding tasks to advanced agentic ones.

# 4. Codestral

Codestral is a dedicated code transformer tuned for code generation across 80+ programming languages, developed by Mistral AI. It was introduced in two variants—22B and Mamba 7B—with a large 32k context window. They are designed for low latency relative to their size, which is useful during live editing. The weights are downloadable under Mistral’s Non-Production License (free for research/testing), and commercial use requires a separate license.

For local coding, the 22B is competent and fast enough in 4-/8-bit on a single strong GPU for everyday usage, and it remains capable of longer generations for bigger projects. Mistral also offers Codestral endpoints, but if you’re staying fully local, the open weights plus common inference stacks are already enough.

# 5. Code Llama

Code Llama is a model family fine-tuned for coding, based on Llama, with multiple sizes (7B, 13B, 34B, 70B) and variations (base, Python-specialized, Instruct) developed by Meta. Depending on the version, the models can operate reliably for their specific usage, such as infilling or Python-specific tasks, even on very long inputs (up to ~100k with long-context techniques). All are available as open weights under Meta’s community license, which allows broad research and commercial usage.

Code Llama is a popular baseline for local coding agents and IDE copilots because the 7B/13B sizes run comfortably on single-GPU laptops and desktops (especially when quantized). In comparison, the 34B/70B sizes offer stronger accuracy if you have more VRAM. With various versions, there are many application possibilities—for example, the Python model is well-suited to data and machine learning workflows, while the Instruct variant works well with conversational and vibe-coding flows in editors.

# Wrapping Up

As a reference for what we discussed above, this is the overall comparison of the models covered.

The Best Local Coding LLMs You Can Run Yourself

Click to enlarge

Depending on your requirements and local performance, these models can support your work effectively.

I hope this has helped!

Cornellius Yudha Wijaya is a data science assistant manager and data writer. While working full-time at Allianz Indonesia, he loves to share Python and data tips via social media and writing media. Cornellius writes on a variety of AI and machine learning topics.