Logo

Faster Reviews,

Faster Shipping

Merrill Lutsky
Co-founder & CEO, Graphite
Back to podcasts

Instant PR Feedback Without Leaving GitHub

with Merrill Lutsky

Transcript

Chapters

Trailer
[00:00:00]
The Story of Graphite and Reinventing Code Review
[00:01:20]
Inside Graphite Agent: AI That Reviews Your Code
[00:04:00]
What AI Code Review Gets Right (and Still Gets Wrong)
[00:06:30]
The Explosion of AI-Written Code Across the Industry
[00:10:30]
What Makes Graphite Stand Out From Every Other AI Tool
[00:16:00]
Human vs AI Review: Collaboration, Learning, and Trust
[00:25:00]
The Future of Developers in an AI-Driven World
[00:41:00]

In this episode

In this episode of AI Native Dev, host Guy Podjarny sits down with Merrill Lutsky, CEO and co-founder of Graphite, to explore how AI is revolutionizing code review in software development. They discuss the transformation of the "outer loop," focusing on Graphite's AI reviewer, Graphite Agent, which enhances productivity by catching logic errors and enforcing standards while allowing human reviewers to focus on architectural decisions and systemic risks. Discover how to effectively integrate AI-assisted reviews into your workflow, codify standards, and maintain a collaborative PR process that leverages both AI and human expertise.

Code review is being reimagined in the age of AI. In this episode of AI Native Dev, host Guy Podjarny talks with Merrill Lutsky, CEO and co-founder of Graphite, about how AI is transforming the “outer loop” of software development—particularly code review—and what it takes to make AI a productive collaborator rather than noise. From Graphite’s origins with stacked PRs to its AI reviewer, Graphite Agent (formerly “Diamond”), Merrill shares what’s working, what isn’t, and how teams can operationalize AI-assisted review at scale.

From Stacked PRs to AI-Assisted Review: Unblocking the Outer Loop

Graphite started by focusing squarely on the outer loop: everything that happens after code is “ready” and needs to get through review, CI, and merge. Drawing inspiration from internal tooling at Google and Meta, Graphite popularized stacked PRs—breaking a large change into small, reviewable slices and decoupling development from review so both can proceed in parallel. That early bet paid off by reducing merge conflicts, getting faster feedback on smaller diffs, and making it easier for teams to keep shipping even as complexity grows.

The AI inflection point—spurred by tools like Cursor and Claude Code and large model upgrades like Claude 3.5 Sonnet—multiplied the volume of changes in flight. Engineers now spin up multiple AI-assisted workstreams concurrently, pushing out far more PRs than before. This made Graphite’s original problem 10x more pressing: senior reviewers were getting buried in mountains of diffs. In response, Graphite introduced Graphite Agent, an AI reviewer designed to take the first pass on PRs, catch issues quickly, and reduce the human burden.

Crucially, Graphite doesn’t treat the PR page as a static checkpoint. The Agent’s chat is embedded directly in the PR, allowing authors and reviewers to ask questions, request changes, and have the AI propose or apply updates in context. It can search the codebase for relevant examples and supporting files, turning the PR into a live collaboration space rather than a one-way gate.

What Graphite Agent Does Well (and Where Humans Still Shine)

On strengths, Merrill says the Agent is particularly good at finding logic errors—including subtle bugs that slip past type checkers and careful human reading. Think boundary conditions, incorrect default values, asynchronous missteps (like missing awaits), misuse of APIs, or error paths that look fine at a glance but fail in edge cases. It also reliably enforces style guides and catches security footguns, offering feedback in seconds so authors can iterate before looping in a human.

That said, system-level architecture decisions remain a human domain. The Agent can read a diff and retrieve local context, but architecture involves trade-offs, performance characteristics, and organizational constraints that often live in disparate places—or in someone’s head. Today, senior engineers still need to set direction, evaluate cross-cutting concerns, and arbitrate design choices that span services and teams.

There are also uneven spots in model coverage. Languages underrepresented in training data, such as Kotlin in some mobile repos, yield lower accuracy today. And while end-to-end validation (spin up environment, click through, record a video, and attach it to the PR) is a promising near-term direction, it’s not yet a turnkey capability. The right pattern is clear: use the Agent for the high-signal first pass and keep humans in the loop for architecture, product risk, and final accountability.

Context Over IQ: Feeding the Reviewer the Right Signals

Is the main limiter AI reasoning or missing context? Merrill’s view: given perfect information, models can already reason well enough to be useful at higher levels. The harder problem is product and workflow: corralling the right context from scattered sources—design docs in Notion, old decision records, code comments, tickets, and tribal knowledge—into the PR surface so the AI (and humans) can make sound judgments.

Graphite tackles this by retrieving repository context and ingesting codified standards wherever they live. If a repo has a Claude MD or Cursor rules file, Graphite pulls that into the Agent’s context to ensure consistent advice across tools. In the PR chat, developers can ask the Agent to search the codebase for prior art, relevant helpers, or historical diffs, making it easier to validate a change against established patterns.

For developers, the takeaway is to make context machine-consumable and PR-attached. Link relevant tickets and design docs in the PR description, summarize intent and constraints, and cite performance or security requirements up front. Treat “change provenance” as a first-class object—what problem this solves, why this approach, and what trade-offs were considered—so the Agent and human reviewers evaluate the same facts.

The AI Code Wave: Volume, Quality, and the Human-in-the-Loop

Across Graphite customers, engineers are changing about 70% more lines of code per week than at the end of 2023—a surge likely attributable to AI copilots. That translates into more PRs, more context-switching, and more pressure on senior reviewers. AI-assisted review relieves the immediate bottleneck by triaging routine issues, but organizational process has to evolve alongside it.

One challenge: there’s no agreed-upon way to mark AI involvement in Git history. Without a standard trailer or metadata convention, it’s hard to measure downstream impacts of AI-authored code or tailor review heuristics accordingly. While public stats suggest more than half of code is now AI-generated—and likely trending toward two-thirds or more—Graphite (and the industry) can’t reliably distinguish authorship post hoc.

Quality varies by baseline. In top-tier engineering orgs, AI-generated code often isn’t yet at the median reviewer’s bar, reinforcing the need for human refinement. In other environments, AI can exceed the average contributor on routine tasks. The operational model that works across contexts is consistent: let the Agent run a fast, thorough first pass, then have humans focus on architectural coherence, systemic risk, and product correctness.

Codifying Quality: Custom Rules, Shared Standards, and Feedback Loops

Graphite Agent supports organization-specific standards via custom rules written in plain English or regex. Teams can encode naming conventions, security policies, dependency hygiene, error handling expectations, and other house rules. The Agent also ingests existing config where possible (e.g., Cursor or Claude rule files) to reduce duplication and keep guidance consistent across local editors and the PR surface.

Transparency matters for trust. Graphite attributes its comments to the rule that triggered them and exposes insights on which rules fire most often and where the Agent is catching issues versus being ignored. That makes it easier to tune thresholds, de-noise low-value checks, and spot policy gaps that deserve promotion from “advice” to “must.”

Implementation best practices: start with a short list of high-signal rules (security, breaking changes, API contracts, error handling), use regex for precise lint-like checks, and use natural-language rules for style and policy that benefit from flexibility. Wire rules into PR templates and contributor docs, and revisit your rule set monthly to eliminate noisy checks, add new patterns surfaced by incidents, and keep your signal-to-noise ratio high.

Key Takeaways

  • Treat the PR as a collaborative workspace. Embed an AI reviewer in the PR, enable chat, and let it propose changes, search the codebase, and enforce rules before humans step in.
  • Use AI for the first pass, not the final say. Let the Agent catch logic bugs, style drift, and security footguns quickly; keep humans focused on architecture, product risk, and system-level trade-offs.
  • Feed the model context, not just code. Link tickets and design docs, summarize intent and constraints in PR descriptions, and centralize “change provenance” so both AI and humans judge the same facts.
  • Codify standards once and reuse them everywhere. Write custom rules in natural language or regex, ingest existing Cursor/Claude rule files, and keep guidance consistent across IDEs and PRs.
  • Measure and tune the feedback loop. Track which rules fire and which are ignored, reduce noise, and iterate monthly. Watch PR throughput, time-to-merge, and “AI catch rate” as leading indicators.
  • Plan for known gaps. Expect weaker performance in underrepresented languages (e.g., Kotlin in some repos) and on architecture. Add targeted human review and, where possible, higher-level tests or E2E validations.
  • Push toward provenance for AI-generated code. Until there’s a standard, consider internal conventions (e.g., commit trailers) to tag AI involvement and inform review policies and post-merge analysis.

Chapters

Trailer
[00:00:00]
The Story of Graphite and Reinventing Code Review
[00:01:20]
Inside Graphite Agent: AI That Reviews Your Code
[00:04:00]
What AI Code Review Gets Right (and Still Gets Wrong)
[00:06:30]
The Explosion of AI-Written Code Across the Industry
[00:10:30]
What Makes Graphite Stand Out From Every Other AI Tool
[00:16:00]
Human vs AI Review: Collaboration, Learning, and Trust
[00:25:00]
The Future of Developers in an AI-Driven World
[00:41:00]