Back to articles7 AI Devtools to Watch This December

5 Dec 202520 minute read

Baptiste Fernandez

Building AI Native Development community, spotlighting exciting releases and innovations in the space

AI-Native Development

Developer Experience

AI Coding Tools

Table of Contents

Conductor: orchestrating parallel agents locally

Graphite: code review reimagined for AI velocity

Google Code Wiki: automated documentation that stays current

Kilo Code: open-weight agentic development

Letta Context-Bench: measuring agentic context engineering

Google Antigravity: agent-first development platform

LocalAI: agentic workflows with MCP

What this selection reveals

Back to articles

7 AI Devtools to Watch This December

5 Dec 202520 minute read

The AI Native Dev Landscape exists to answer a simple but practical question: in a world where new tools launch every week and hype moves faster than reality, what are developers actually* using?

If you’re building with AI, make it a habit to explore one new tool each week—you’ll level up your workflow far faster than you expect. And with more tools entering the landscape constantly, it’s worth keeping an eye out for what arrives next.

Looking at both traffic patterns and our own curation, 7 tools emerged as particularly interesting. Not because they're the flashiest (or have the biggest marketing budgets), but because they represent distinct approaches to how AI can fit into the development workflow.

Before diving into specific tools, it's worth framing these through the lens of: trust and change. Trust indicates how much you need the tool to get it right for it to be valuable. Change indicates how much you need to alter your existing workflow to use it.

Before examining specific tools, the same framework applies: trust and change. Trust indicates how much confidence you need in the tool getting it right to derive value. Change indicates how much workflow adaptation the tool requires.

The tools gaining traction typically operate in "high adoption" territory—they fit existing workflows and produce verifiable results. But the most interesting tools push boundaries, asking developers to work differently because the value proposition justifies the friction.

Conductor: orchestrating parallel agents locally

Conductor tackles a fundamental problem that emerges as agents become more capable: how do you manage multiple coding agents working simultaneously without losing your mind? The tool runs on your Mac and orchestrates parallel workspaces - each agent gets its own isolated git worktree to work in.

What makes this interesting is the orchestration layer. Rather than just spawning agents and hoping they don't conflict, Conductor provides visibility into what each agent is working on and structured review mechanisms for merging their changes. It supports both Claude Code and Codex, working with however you're already authenticated.

This sits firmly in attended workflows. You're not trusting agents to autonomously merge changes—you're reviewing their work. But the parallel execution model acknowledges a reality: as agents get better at focused tasks, developers will want to run multiple simultaneously. Conductor makes that tractable by handling the coordination overhead.

The trust requirement is moderate because you control the merge step. If you're already using coding agents, the change requirement would be low —Conductor just makes it possible to use them at scale.

https://conductor.build/

Graphite: code review reimagined for AI velocity

Graphite is rethinking code review for teams shipping faster with AI. The core insight: traditional PR workflows weren't designed for the velocity AI enables. When you can generate significant code quickly, the bottleneck shifts to review and merge coordination.

Graphite introduces stacked PRs - breaking larger changes into sequenced, smaller chunks that can be reviewed independently. This addresses a real pain point: with AI assistance, you can create large features quickly, but reviewing massive PRs is slow and error-prone. Stacking lets you ship incrementally without waiting for each review to complete.

The platform includes an AI agent that operates directly in your PR page. It can resolve CI failures, suggest fixes, and help you iterate without context switching. The merge queue is stack-aware, landing PRs in order while keeping branches green.

What's clever here is recognizing that AI changes both sides of the equation. Developers can write code faster, but reviewers still need time and context. Graphite optimizes for the new bottleneck by making reviews faster and less blocking. The stack-aware merge queue ensures velocity doesn't compromise stability.

In other words, with Graphie and its GitHub sync, it means the process fits existing workflows. The value would be immediate if your team already struggles with review velocity.

https://graphite.com/

Google Code Wiki: automated documentation that stays current

Google launched Code Wiki to tackle documentation's oldest problem: it becomes outdated the moment you write it. Code Wiki generates comprehensive documentation for repositories and regenerates it automatically after every change.

The system scans the full codebase, maintains links to every symbol, and creates interactive documentation where you can navigate from high-level explanations to specific code locations. A Gemini-powered chat interface uses the always-current documentation as context for answering questions about the codebase.

What makes this approach interesting is treating documentation as a continuously regenerated artifact rather than something developers maintain manually. The public preview works for open-source projects. Google is developing a Gemini CLI extension for private repositories - particularly valuable where legacy code is poorly documented and institutional knowledge has eroded.

The challenge with any auto-documentation system is whether the generated content is accurate enough to trust for important technical decisions. Code Wiki includes the standard disclaimer that Gemini can make mistakes. But the linking to actual code makes verification straightforward.

This targets a genuine pain point. New contributor onboarding, understanding legacy decisions, and maintaining architectural knowledge are persistent challenges. If AI-generated documentation proves reliable enough, it removes significant friction.

The question is whether teams will trust it enough to act on it without verification—that determines whether it stays in attended workflows or moves toward autonomy.

The team at Tessl has already built the open source registry: a version-aware library documentation, coding styleguides and reusable workflows to steer your agents.

https://codewiki.google/

Extra! The challenge with any auto-documentation system is whether the generated content is accurate enough to trust for important technical decisions. Code Wiki includes the standard disclaimer that Gemini can make mistakes. But the linking to actual code makes verification straightforward.

The question is whether teams will trust it enough to act on it without verification—that determines whether it stays in attended workflows or moves toward autonomy. The broader challenge here is giving agents the right documentation context at the right time. Tessl's open source registry tackles a related problem: providing version-aware library documentation, coding styleguides, and reusable workflows that agents can reliably pull from—treating documentation as structured context for steering rather than just reference material for humans.

Kilo Code: open-weight agentic development

Kilo positions itself as the open alternative in the coding agent space. The pitch: switch between 500+ models, bring your own API keys, see exactly what models are being used, and inspect every prompt and tool call.

The platform includes orchestrator mode that breaks complex projects into subtasks and coordinates between different agent modes. It features Context7 integration—automatically looking up library documentation to reduce hallucinations. The debug mode systematically traces through your codebase to locate bugs.

What's notable is the transparency emphasis. No silent context compression, no hidden model switching, no locked-in providers. This matters for teams that want control over costs and model choice. The open-source plugin under Apache 2.0 license means you can see and modify how Kilo works.

The parallel mode capability acknowledges that complex projects benefit from multiple agents working simultaneously—similar to Conductor's insight but integrated at the platform level rather than as a separate orchestration layer.

This appeals to developers who've hit friction with closed platforms. The transparent pricing model (pay exact list price from providers, Kilo makes money on Teams/Enterprise) and bring-your-own-keys approach addresses concerns about vendor lock-in and hidden costs.

If you're already using VS Code or JetBrains, Kilo integrates as a plugin. It fits into existing workflows while offering model flexibility that proprietary tools don't provide.

From an adoption standpoint, this is incremental. The trust requirement is moderate because you're reviewing agent output. This would fit into existing workflows while offering model flexibility that proprietary tools don't provide.

https://kilo.ai/

Letta Context-Bench: measuring agentic context engineering

Letta released Context-Bench, a benchmark evaluating how well language models handle "agentic context engineering": when agents themselves strategically decide what context to retrieve and load.

The benchmark measures agents' ability to chain file operations, trace entity relationships, and manage multi-step information retrieval in long-horizon tasks. Questions require multiple tool calls and strategic information management—agents can't answer correctly without navigating file relationships.

What makes this valuable is addressing a critical challenge: as agents tackle longer tasks, determining what information should be in the context window at any moment becomes crucial. Too much causes context rot; too little causes hallucinations.

The findings reveal interesting patterns. Claude Sonnet 4.5 leads at 74% accuracy, demonstrating that models explicitly trained for context engineering excel. But open-weight models are closing the gap—GLM-4.6 achieves 56.83%, Kimi K2 scores 55.13%. Even top models miss 25-30% of questions, indicating substantial room for improvement.

This benchmark matters because it measures a specific capability that's critical for production agents but often overlooked: the meta-problem of managing what information you need to solve the actual problem. As tasks extend beyond native context windows, models that excel at context engineering will handle long-horizon work more reliably.

For teams building with agents, Context-Bench provides data for model selection when context management is critical. For model developers, it highlights a training dimension that differentiates performance on real-world agentic tasks.

While Context-Bench focuses on measuring raw context utilization, other approaches like specs eval framework examine how structured context translates into practical task completion—offering a different lens on the same fundamental question of context value.

https://www.letta.com/blog/context-bench

Google Antigravity: agent-first development platform

Google launched Antigravity as an "agent-first" development platform alongside Gemini 3. Rather than treating AI as a sidebar feature, Antigravity gives agents dedicated workspace with direct access to editor, terminal, and browser.

The platform splits into two modes: Editor View for hands-on coding with AI assistance, and Agent Manager for deploying agents that autonomously plan and execute complex tasks. Agents communicate work via Artifacts—screenshots, task lists, implementation plans—that you can review and comment on without stopping execution.

What's architecturally interesting is the inverted paradigm. Instead of agents embedded within surfaces, surfaces are embedded into the agent. This reflects a bet that models like Gemini 3 are capable enough to operate across multiple environments simultaneously.

Antigravity includes learning as a core primitive. Agents save useful context and code snippets to a knowledge base, improving future task execution. The platform is cross-platform (Mac, Windows, Linux) and supports Gemini 3 Pro, Claude Sonnet 4.5, and OpenAI's GPT-OSS models with generous rate limits. In light of Google’s acqui-hire of Windsurf back in July 2025, Antigravity also carries more than a whiff of Windsurf’s DNA- more on this in here.

With Antigravity, you're trusting agents to plan, execute, and verify work across your development environment. You're adapting to a task-oriented interaction model rather than line-by-line coding. The Artifacts approach attempts to make agent work reviewable without overwhelming you with tool call details.

Antigravity represents Google's vision for agent-era development. Available free in public preview, it's an ambitious platform play—not just adding AI features to existing tools but reimagining the developer experience around autonomous agents. Time will tell whether devs embrace this paradigm shift or prefer incremental AI assistance in familiar environments.

https://antigravity.google/

LocalAI: agentic workflows with MCP

What makes LocalAI architecturally interesting is its positioning as a privacy-first, self-hosted alternative that mimics OpenAI's API. You run LLMs, generate images, and use agents locally or on-prem with consumer-grade hardware. No GPU required for many use cases.

The 3.8.0 release makes agentic workflows practical for teams that need to keep data on-premises. LocalAI 3.8.0 significantly upgrades support for agentic workflows using the MCP. The release also adds live action streaming—watch agents "think" in real-time, seeing tool calls, reasoning steps, and intermediate actions as they happen rather than waiting for final output.

LocalAI supports multiple backend types (llama.cpp, diffusers, transformers, and more) and can run models from various sources including Hugging Face, Ollama, and standard OCI registries. The platform automatically detects GPU capabilities and downloads appropriate backends.

This addresses a real deployment constraint. Many organizations can't send code or data to external APIs but still want to leverage AI capabilities. LocalAI provides the infrastructure to run agents locally while maintaining API compatibility with OpenAI-style integrations.

The trust requirement is moderate to high depending on your use case. The change requirement is low if you're already running local inference—LocalAI just makes agentic patterns more practical. For organizations new to self-hosted AI, there's infrastructure overhead, but the MCP support and live streaming features reduce the gap between local and cloud agent experiences.

https://localai.io/

What this selection reveals

Looking across these seven tools, patterns emerge around where current development is focused:

- Orchestration and coordination (Conductor, Kilo, Antigravity): As agents become more capable, managing multiple agents and their interactions becomes the bottleneck. Tools are emerging to handle parallel execution, workspace isolation, and coordination complexity. - Code review velocity (Graphite): AI accelerates code generation, but review remains a human bottleneck. Tools that optimize review workflows and reduce blocking will be critical as teams ship faster. - Agent-first platforms (Antigravity): The boldest bets involve reimagining development around autonomous agents rather than incrementally adding AI features to existing tools. - Documentation as infrastructure (Code Wiki): Treating documentation as continuously regenerated rather than manually maintained could solve the staleness problem if AI-generated content proves reliable enough. - Context management (Letta Context-Bench): As tasks extend beyond native context windows, the meta-skill of managing what information to retrieve becomes differentiating for model performance. - Transparency and control (Kilo, LocalAI): Not all teams want black-box solutions. Demand exists for open-source alternatives, model flexibility, and visibility into costs and behavior.

We need both incremental tools that improve current workflows and speculative platforms exploring new paradigms. The ecosystem benefits from having options across the trust/change spectrum. What's less visible but equally important: tools that haven't gained traction yet but represent useful approaches.

The AI development landscape is still taking shape and evolving fast. Our weekly tool spotlight helps you stay ahead by discovering one new AI-native dev tool at a time. This isn’t about sprinkling AI onto existing workflows; it’s about redefining how software is built from the ground up.

Explore the AI Native Dev Landscape, track the shifts happening across categories, and stay plugged into the tools developers are truly adopting.

If you’re an AI native developer, make it a habit: explore one new tool each week and level up your workflow!