News

The Best Open-Source Model for Agentic Coding? Meet Mistral’s Devstral

5 Jun 2025

•

Baptiste Fernandez

What Happened: Devstral Topped SWE-Bench Verified (in the Open-Weight Section)

Mistral AI has announced Devstral, a new open-weights large language model for software development, built in collaboration with All Hands AI (the team behind the OpenHands agent framework). Devstral is a 24-billion-parameter LLM optimized for Agentic software engineering tasks, able to tackle GitHub issues across entire codebases. This size stands out, particularly in light of the comparisons Simon Willison shared in his AI Engineer World's Fair keynote.

Released under the permissive Apache 2.0 license, Devstral’s weights are freely available for both research and commercial use. Notably, the model runs on accessible hardware (a single NVIDIA RTX 4090 GPU, or even a 32GB Mac!) owing to its relatively compact size and efficient design.

Critically, Devstral has strong performance on the SWE-Bench Verified (a dataset of 500 real-world GitHub issues with tests for correctness), and it achieved a 46.8% success rate on this benchmark - over 6 percentage points higher than any previously published open-source model.

Devstral even outperformed some massive proprietary models, including DeepSeek’s 671B-parameter V3 model and Alibaba’s 232B-parameter Qwen3, despite its much smaller size. Mistral’s Devstral has thus taken the lead as the best open-weight model optimized for coding agents. While other models remain highly relevant for tasks like chat and reasoning, Devstral marks a new specialization in agentic workflows.

Community Reactions: Devstral Hits Sweet Spot Between Performance & Accessibility.

The developer community’s response to Devstral has been largely positive. Many developers are impressed by the model’s capabilities given its size and openness. On Hacker News, one user remarked that Devstral’s 46.8% benchmark score “is better than [Anthropic] Claude 3.6 (with an open scaffold) … and considering you can run this for almost free, this is an extraordinary model.”

This sentiment of accessible and high-performance resonated with developers who have been seeking an alternative to expensive API-based tools.

Open licensing has also earned significant goodwill. The full weight release under Apache 2.0 has been lauded as a positive step for the open-source AI ecosystem, reinforcing trust that developers can use and integrate the model without legal hurdles. Early user experiences with Devstral are emerging as people test it in their workflows. The ability to handle a 128k-token context (which can cover an entire repository’s code) also drew positive attention, as it allows Devstral to consider much more context than typical local models.

There are also words of caution and curiosity about limits. For example, devs have pointed out that cutting-edge closed models still have an edge, albeit at far greater cost and resource. A power-user on Reddit noted that Devstral, being tuned for the OpenHands agent use-case, “seems a bit too specialized” and “it is clear Devstral in most tasks cannot compare to DeepSeek 671B, which is my current daily driver”, though they immediately add that DeepSeek is “too slow … hence why I am looking into smaller models.”

Overall, the response has been optimistic, and with the release of Mistral Code, an AI-powered coding assistant built on Devstral, Codestral, Codestral Embed, and Mistral Medium, there’s a growing sense of momentum and excitement around Mistral’s expanding toolset.

The AIND Take: A Noteworthy Development in the Open-Weights Realm.

At AI Native Dev, we’re always keeping an eye on innovations pushing the boundaries of code generation. We recently took DevStral for a spin, and my colleague and friend Richard described it as the “best experience so far” among open-weights models we’ve explored.

DevStral is the first open weights model I’ve tried that didn’t crash out of the gate. The codegen wasn’t perfect, but it followed the spec, which is a huge step forward. If you’re looking for an Apache 2 open-weights model, I'd recommend giving DevStral a try.
Richard Tweed, Senior Platform Engineer @Tessl.io

Looking ahead, Devstral’s debut marks a noteworthy development in the open-weights realm. Mistral has already hinted that a “larger agentic coding model” is in the works for release in the coming weeks, which could further boost performance. If that upcoming model extends Devstral’s capabilities (potentially at 30B+ parameters), we may see open-source AI reaching deeper into territory that was, until now, dominated by proprietary giants (yay!).

We appear to be moving from AI as a sidebar tool, like code autocomplete or chat assistants, to AI as an integrated development team member (this is happening so fast!). In practice, this may evolve into development workflows where devs define a problem or high-level design, and an agentic system powered by Devstral takes on the heavy lifting of implementing and iterating on the solution.

In that same thread, we are hearing a lot more about agent parallelization. It's reminiscent of how software engineering embraced automation in the past – version control and continuous integration automated tedious integration tasks, and now continuous “autonomous coding” could become a new layer in the pipeline. Developers will (and some have begun) to routinely assign an AI agent a ticket (bug fix or feature request) - you can see more of this in GitHub’s Copilot Agent release. Current AI models, including Devstral, still have limitations in reasoning and can introduce errors or insecure code if left unchecked.

This is why it’s encouraging to see Devstral’s design uses test cases as part of the loop via the OpenHands scaffold (e.g. AI agents that verify their own output against real unit tests). For our developers, folks out there, we recommend experimenting in a controlled environment: do not use Devstral on critical issues, and leverage it in a sandbox repository to gauge its suggestions.

Make sure to perform code reviews on contributions, just as you would for a human developer. With Mistral code now available on VSCode or JetBrains, you can leverage Devstral for agentic coding and experiment with various refactoring and debugging workflows.

As a final thought, Mistral’s been relatively quiet compared to the buzz surrounding some of the incumbents. With Devstral and Mistral Code, they’ve re-entered the spotlight. And I have to admit, as a Frenchie, there’s something quite satisfying about seeing a homegrown effort holding its own in such a talented and competitive global space. That said, innovation doesn’t have nationality, and Mistral is clearly here to play.

AIND Newsletter

‹ The Dark Side of “Just Hooking Up" AI Agents to GitHub

The New Frontier in AI Development: Why Agent Experience Matters ›