20 Nov 202512 minute read

20 Nov 202512 minute read

Every organisation working with AI models has encountered the gap between “can generate code” and “can deploy/manage code reliably”. That gap is widening as repositories grow, infrastructure evolves and expectations shift from ad-hoc tools to full-stack workflows.
Google’s newest releases land directly in the space: Gemini 3, its latest multimodal language model; and Antigravity, an agent-first development platform. Together, they give a good indication of what the next era of AI-assisted software looks like, according to Google at least: less autocomplete, more autonomous execution for those lacking the bandwidth or skillset to build full systems themselves.
“We want Antigravity to be the home base for software development in the era of agents,” Google wrote in its blog post. “Our vision is to ultimately enable anyone with an idea to experience liftoff and build that idea into reality.”
Gemini 3 is described by Google CEO Sundar Pichai as its “most intelligent model, built to grasp depth and nuance”.
To back this up, Demis Hassabis (CEO, Google DeepMind) and Koray Kavukcuoglu (CTO of DeepMind and Chief AI Architect at Google) presented a suite of benchmark results across text, reasoning, science and multimodal tasks.
The data shows that Gemini 3 scored (at the time of writing) 1,498 Elo on the LMArena leaderboard (text), a head-to-head evaluation where models compete on human-judged tasks, placing it among the top publicly tested systems. On Humanity’s Last Exam, a high-difficulty reasoning benchmark, it achieved 37.5% without tools, while on GPQA Diamond — which measures graduate-level scientific reasoning — it scored 91.9%. The model also reached 23.4% on MathArena Apex, a mathematics benchmark designed to stress symbolic and multi-step reasoning rather than pattern recall.

It's worth noting that many of these preliminary scores carry a "contamination" warning, indicating that the benchmark was public at the time of evaluation and models may have had prior exposure to the prompts or solutions, limiting their value as clean measures of generalization.
Beyond text, Google says Gemini 3 scored 81% on MMMU-Pro—a multimodal, university-level exam spanning charts, diagrams and scientific visuals—and 87.6% on Video-MMMU, which evaluates temporal reasoning across sequences and a model’s ability to track entities and causality over time, though these results don't yet seem to be reflected on the public benchmark leaderboards.
However, these results suggest strong performance on long-horizon reasoning across varied inputs, a direction reflected in the model’s broader capabilities.
Immediately, Google is integrating it into its core products, from the Gemini app to AI Studio for prototyping and Vertex AI for enterprise-scale model development and deployment.
For developers, however, Gemini 3’s value lies in coordinating work across tools and modalities: tracing dependencies, maintaining state through longer tasks, and keeping context intact across evolving repositories. That orientation pushes the model toward orchestration, a shift that becomes visible in Antigravity.
Antigravity, in a nutshell, is a workspace where agents can act across tools with enough structure and traceability for developers to audit what happened and why.
At its core, it functions as a full IDE, but with an additional “Agent Manager” surface that acts like a coordination layer for background agents—handling backlog tasks, codebase research and long-running work without forcing the developer to constantly context-switch.

It treats software development as a collaboration between human architect and autonomous agents: models plan workflows, verify their own output and hand tasks off to new agents when ready, operating across surfaces like the editor, terminal and browser.
For example, a user can start a new workspace to build, say, a flight-tracking application and describe the desired behaviour in natural language. The agent responds by generating a structured implementation plan, creating the initial project tasks, and outlining the files needed to begin development.

It then moves into execution, carrying out that plan step by step—creating tasks, generating scaffold code and logging actions as artifacts such as Implementation Plan and Task – before applying those changes to the project.

The platform produces what Google calls Artifacts (not related at all to Claude's artifacts): task lists, implementation plans, screenshots or browser recordings—human-verifiable traces of agent behaviour rather than raw model outputs. Users switch between an Editor View that looks like a familiar IDE and a Manager View that orchestrates multiple agents across workspaces.
For teams with sprawling, tangled codebases, this matters. As systems age and grow, simple autocomplete tools fracture; tracing dependencies, workflows and verification become just as important as generation. An agent-first platform suggests a path toward multi-step autonomous workflows with oversight baked in, rather than isolated prompts.
Alongside this core workflow, Antigravity also supports asynchronous execution, persistent workspaces and browser-level actions, with safety and verification layers framing how work is carried out.
It’s worth looking at the origins of Antigravity. Back in July 2025, Windsurf — the agentic IDE startup — found itself at the centre of a high-stakes talent and IP scramble. First, Google absorbed Windsurf’s founders (including CEO Varun Mohan) and top R&D personnel under a non-exclusive licensing deal pegged at more than $2 billion, then a few days later, Cognition acquired Windsurf’s remaining product, brand, IP and staff in a definitive agreement.
That talent didn’t vanish into the ether; some has clearly surfaced directly in Antigravity’s development. For example, Kevin Hou, part of Windsurf’s founding team before joining Google DeepMind in July, took the reins on the official Antigravity demo video. And Anshul Ramachandran, another former Windsurfer, also confirmed that he had been deeply involved in developing Antigravity in his four months at DeepMind.
So it may come as little surprise that Antigravity carries more than a whiff of Windsurf’s DNA, a theme that has dominated much of the online discussion around Google’s latest IDE.
One example came from Aiden Bai, who pointed out that Antigravity’s interface still exposes a command labelled “Search for files edited by Cascade,” referring to Windsurf’s original agent.
“Insane that the Windsurf founders exited, left the product, users, and old team for dead… and still managed to forget removing "Cascade" (windsurf's old agent) in Antigravity,” he wrote on X.

Separate screenshots shared by an early tester also show Windsurf-related identifiers — such as MIGRATE_WINDSURF and WIND_SURF_BROWSER — in the packaged JavaScript files.
“Looks like he forgot to do CMD + Shift + R,” he wrote in response to Windsurf’s ex-CEO Varun Mohan’s Antigravity launch post on X.

These echoes of Windsurf sit alongside mixed early impressions of the product itself. Developer Simon Willison initially described Antigravity as “yet another VS Code-fork Cursor clone.” However, he added that “when you look closer it’s actually a fair bit more interesting than that,” highlighting the app spans three coordinated surfaces — an agent manager dashboard, a traditional editor, and a browser integration layer — though his early tests encountered reliability issues, including agents terminating under load.
These reactions feed into a broader sense of fatigue and fragmentation around Google’s AI software development tooling. Indeed, anyone's who's been paying attention will have encountered a number of tangential tools already in the Google stable, such as Project IDX back in 2023, while in 2025 alone we’ve seen Firebase Studio, Jules and Gemini CLI enter the fray.
That proliferation has led some developers to openly question where Antigravity fits. Gergely Orosz, AKA the Pragmatic Engineer, took to X this week to lament exactly that.
Orosz argued that Google is launching too many similar products without a clear point of focus. “This is what I mean by no coherent strategy at Google btw,” he wrote. “Which one is the focus of Google? Do they even know?”
He said that Antigravity is the third AI-powered IDE the company has introduced recently, summarising the fatigue that comes with that pace. “I am all for launches, but hard to get excited by the 3rd AI IDE launch in \~6 months by Google.”
That tension parallels a broader fatigue. Whether Antigravity consolidates that stack, or merely expands an already fragmented portfolio, will hinge on whether Google can align these tools into a coherent, durable strategy.
For those wanting to check it out, Antigravity is available in public preview now across MacOS, Windows, and Linux, with no charges to access the environment itself (model usage is still metered and billed separately.)
It's also worth noting that Antigravity supports multiple models at launch, including Gemini 3 Pro, Anthropic’s Claude Sonnet 4.5, and OpenAI’s GPT-OSS, offering developers a degree of model optionality.