How Too Much Information Destroys Agent Performance | AI Native Dev

Back to podcasts

How Too Much Information Destroys Agent Performance

20 Jan 2026with Robert Brennan, and Itamar Friedman

Also available on

Transcript

[00:00:00] Intro: Before we jump into this episode, I wanted to let you know that this podcast is for developers building with AI at the core. So whether that's exploring the latest tools, the workflows, or the best practices, this podcast's for you. A really quick ask. 90% of people who are listening to this haven't yet subscribed.

[00:00:24] Intro: So if this content has helped you build smarter, hit that subscribe button and maybe a like, alright, back to the episode.

[00:00:32] Simon Maple: Hey, I'm here still at Q Con and look who I bumped into in the corridor. Itamar Friedman, CEO of Qodo. Itamar, how are you?

[00:00:40] Itamar Friedman: Way to bump into you. Absolutely. Such a great conference.

[00:00:43] Itamar Friedman: So it's, yeah, absolutely. I'm having fun here. And

[00:00:44] Simon Maple: Of course you've been on AI Native Dev twice I think. I think so. Twice. But I'm on road. First time on the podcast. Yeah, true. First time on the podcast. What was your session about today, Itamar? I think you've just given it.

[00:00:56] Itamar Friedman: Yeah, I just came back from it there. So multi-agent system, I gave a motivation behind why you want to have that to start with, specifically for software development, talking about code quality, bad code, good code. Software development lifecycle in general and how we want to cover it.

[00:01:15] Itamar Friedman: And that was the first half of the motivation. And then we dived into different architectural decisions that you need to make when you're implementing multi-agent systems, when you're developing a multi-agent system for your software development lifecycle. Interesting. There was a paper by Google just recently, so I even managed to squeeze that in. It's five days old.

[00:01:35] Simon Maple: Oh really? Wow. That's nice. Good timing. So when talking about multiple or multi-agent systems, are you talking about multiple agents that can then therefore code in parallel or are you talking about different agents that do different roles, different jobs, look after each other, where one codes the other one code reviews, that kind of thing?

[00:01:51] Itamar Friedman: Exactly that. Oh, exactly.

[00:01:53] Itamar Friedman: That's one. I mean, you can have multi coding agents, all of them writing code in parallel or sequential, and you need to decide how you want to do that. You can have a diversity of agents with different roles. And we talked about why you need them at all.

[00:02:09] Itamar Friedman: And then how do you see, do they run in parallel or after the coding agents or before? So roughly speaking, there's multiple considerations you need to have. For example, let me give you a concrete example. Let's say that you have an agent that helps you with the planning and then you have an agent that helps you with the coding, and then they don't agree with each other.

[00:02:25] Itamar Friedman: How do you decide? How do you have a third agent which is arbitrary that makes a decision, et cetera. And there's different considerations that you need to have. And is there value in saying I'm going to use different models, different agents, different vendors of agents in terms of doing different roles?

[00:02:39] Simon Maple: And is there value in saying I'm going to use different models, different agents, different vendors of agents in terms of doing different roles? I guess it's almost like having diversity in a team, right? Where you have different people thinking differently and almost contradicting each other where you say, "Well, actually this is how I think this should be done." If you have all agents that are the same agents, but still having those different roles, what's the contrast between those two scenarios?

[00:03:04] Itamar Friedman: I think some people, when they're thinking about different agents, the first thing that pops into mind is "Okay, I give it a different prompt," or something like that. That architecture is basically an architecture of one single agent that rules them all.

[00:03:19] Itamar Friedman: All you need to do is give it different tools and a different prompt. That's one option. The other option is that the agents are much more different than that. For example, their entire architecture, their graph of how they build might be completely different.

[00:03:36] Itamar Friedman: For example, if you're dealing with a coding agent, then in order to really do its job, it's proven that you need to enable its creativity. Because we developers, we bump into a lot of issues and we bypass them. The fact that the core is an LLM making the decision is a good architecture for a coding agent.

[00:03:56] Itamar Friedman: At least what we're seeing, for example, in SWE-bench, if you're familiar. And then when you're talking about other types of agents, for example, you want to have a security agent or you want to have a code review agent. That agent might want to have much more structure checking A, B, C, D.

[00:04:12] Itamar Friedman: For example, you might have 100 rules. We can talk about how you get them. You want to make sure those rules are obeyed. And then you want that agent, the review agent, to be much more structured. There is an LLM there, but the graph of that agent might be one to ten where the LLM is involved, but it's quite structured there. And by the way, you might give a different permission, et cetera.

[00:04:31] Itamar Friedman: So overall, there's a discussion: do you have one agent that rules it all or multiple agents?

[00:04:46] Itamar Friedman: What we're seeing, of course, when you're jiggling with enterprise grade, heavy duty brownfield when compliance and standardization are involved, you must have different agents that are completely different than each other and have different contexts.

[00:04:59] Simon Maple: And do you find different vendors, so maybe Claude or something like that?

[00:05:05] Simon Maple: Do you find that will be better in a specific role? Is there a specific vendor that is best for code reviews or security testing?

[00:05:20] Itamar Friedman: It's a good point to continue from my last point where different vendors have different DNA and a different point of view on the world. Anthropic believes, at least what I understand, is that the model is everything. You need to give that model the brain of everything, and you need to give them agents in order to operate in the world. These tools need to be quite simple.

[00:05:35] Itamar Friedman: On the fly, why do you need to start doing retrieval augmented generation? Just give it a simple tool, like an AST graph, et cetera. That's a pretty good structure and way of thinking for coding agents. But then when you look at a security review agent, and you take a look at the prompt:

[00:06:00] Itamar Friedman: "You are now a security review agent," and then there is a list of exclusions: "Don't look at DDoS, don't look at that." When you're doing a security agent, you need one plus of one minus of that. You need: "Here is everything that you need to check rigorously, and then you can be creative on all the rest."

[00:06:21] Itamar Friedman: Yeah. When you're doing a coding agent, you want to relax it and say what not to do. But when you're doing a review security agent, then you need to do a one minus of what they're doing. So I would use Claude, for example, as a vendor when you're talking about a coding agent.

[00:06:38] Itamar Friedman: But I don't think you can just set up a different prompt to be all types of agents out there. You might need a different vendor.

[00:06:45] Simon Maple: Absolutely. And very briefly, because I see people are kicking out now, so it might get a bit noisy, but very briefly in terms of context: security's a great example where there's a ton of context that you can provide an agent. At what point does the task that you give an agent just become so overwhelming it starts doing a poor job?

[00:07:00] Simon Maple: Security review is a great example, or code review with a large number of styling rules. Where's the limit for that?

[00:07:13] Itamar Friedman: Context is extremely important. I can say context is the king and the kingdom. It's extremely important. I talked about how developers report that 33 to 80% depends on which tech and how you ask it, think that context is the major problem for bad quality and hallucination.

[00:07:35] Itamar Friedman: That's their number one request from coding agents to improve. The thing is that context is not just one thing. It's not just a codebase. It could be a database of vulnerabilities. Qodo is a leading code review tool; for example, we have a database related to best practices and rules that we project depending on the stack that you have.

[00:07:51] Itamar Friedman: And we learn it from different interactions with developers in the context engine. So we collect the database related to best practices and rules. And then you really need to fetch the right context.

[00:08:07] Itamar Friedman: And then it's not just: when you're thinking about context, on one hand it is: you have too much, not accurate; on the other hand it is: exactly only what you need. In the middle, for example, could be a lot of non-relevant but all the relevant, et cetera. Yeah.

[00:08:24] Itamar Friedman: So it's a spectrum. And of course, one of the hardest things is bringing the right context and then making sure it's only that there. These are the priorities. And then that could define a lot.

[00:08:41] Itamar Friedman: For example, as I mentioned,security companies might have context relevant for security that regular coding agents do not have. Code review solutions like Qodo might have additional context that is relevant to best practice standards that regular coding agents don't have. That's the way I look at it: how do you bring the right context and then cut it so you don't have too much? Because there are the context wars. If you want to read more about it, Anthropic and other companies talk about the context wars.

[00:09:06] Simon Maple: Yeah. Super interesting. And so right as well, because giving too much context, you think, "Okay, I've got all these rules that I need to follow," you throw it all in one go and it's just going to forget what it wants to forget or it's not going to add it.

[00:09:21] Itamar Friedman: If I connect all the dots we talked about, context is very important. Maybe one of the most important. And then it's how you use them. When you upload context to Qodo or Cursor, for them it's part of the context. And then they just keep pushing that in the context and their graph.

[00:09:40] Itamar Friedman: If you give Cursor the same prompt with the same context seven times, you get seven different solutions. So I do want to say context is important, but then how you use it, while other tools will use it very differently or in a workflow verifying different things, et cetera.

[00:10:00] Itamar Friedman: Context is extremely important. And not completely following the context is also a matter of how you build the agent. So that's another point to consider.

[00:10:11] Simon Maple: Itamar, it’s been awesome. Hello. And I'm still at QCon, now in New York, but joining me here is Robert Brennan, who is the CEO of OpenHands. Robert, good to see you here.

[00:10:22] Simon Maple: And of course, I saw you in New York only a few weeks ago for AI Native DevCon. Good to see you again. What's your session going to be about today?

[00:10:36] Robert Brennan: So we're going to be talking about how to scale code maintenance with AI agents.

[00:10:39] Robert Brennan: I would say there's a lot of tasks where developers, for their day-to-day feature development, are used to pairing with Claude Code or the OpenHands CLI locally on their laptop. That works great for your ad hoc work; you pick up a ticket, you need to work on it.

[00:10:54] Robert Brennan: But there are certain tasks that are super repeatable, super automatable, where you can get another level of automation and productivity boost by running multiple agents in the cloud to take down these tech-debt oriented tasks.

[00:11:09] Simon Maple: Awesome. And I guess when we talk about maintenance, what kind of things are we talking about? Typical maintenance tasks that a developer would do?

[00:11:15] Robert Brennan: Yeah. So very typical things would be things like dependency management. Resolving open source vulnerabilities that happen in your code base.

[00:11:23] Robert Brennan: And then some meatier tasks like maybe upgrading from Python 2 to Python 3, or upgrading from an old version of Java to a new version of Java, maybe migrating from COBOL to Java or another more modern programming language. There's a lot of these toy tasks that developers don't really like doing.

[00:11:40] Robert Brennan: They don't involve a lot of creativity or critical thinking. It's mostly just a lot of churning out lines of code. But they're still too big for a single agent to one-shot in many cases. Right? And so doing what we call agent orchestration can allow you to put some forethought into how you're going to solve this repeatable problem in a way that can scale across many code bases.

[00:12:02] Robert Brennan: And then just automate it 90% of the way there. And just have a human in the loop for the very little bit where you need that last bit of verification.

[00:12:11] Simon Maple: So let's talk about agent orchestration then. When we talk about agent orchestration, we're obviously talking about multiple agents.

[00:12:17] Simon Maple: Do they have to be the same agent or can they just be different vendors, that kind of thing?

[00:12:21] Robert Brennan: Yeah, I think it's helpful to be working within a single agent framework. There's nothing necessarily stopping one agent from talking to another agent. And there are, you know, there's the agent-to-agent protocol and things like that.

[00:12:32] Robert Brennan: But what we've tried to build with OpenHands SDK is a single framework that allows you to create multiple different agents with different system prompts, different sets of tools that they have access to, different MCP servers that they have access to. And maybe different behavior patterns.

[00:12:48] Robert Brennan: And then getting those agents talking to each other gives you kind of a single framework for thinking about how to coordinate multiple agents together.

[00:12:59] Simon Maple: Right. And when we think about when we start with a large task, typically that a number of agents can all work on.

[00:13:05] Robert Brennan: Yeah.

[00:13:05] Simon Maple: We don't obviously just throw that large task at all these group of agents. We need to break that task down. What's the process of breaking that down? Is there an amount of human interaction in that or can we somewhat automate that through AI as well?

[00:13:16] Robert Brennan: Yeah, I think that's the most critical step: how are you going to break this problem down into sub-pieces that individual agents can execute on? So I think that's where the most human thought goes into the process. But you can work with an LLM like Claude, for instance, to talk through the problem and get some help from an LLM to break down the problem into manageable bite-sized tasks.

[00:13:39] Robert Brennan: But you're also going to have to really lean on your own intuition for what is actually solvable. What's reviewable by a human, what can you give to the agent that you can then look at on the other end when it gives you some output and quickly verify: did it do the thing or did it not?

[00:13:53] Simon Maple: How important is parallelization in this? Because I guess when we think about breaking down that task, not all of it can run all at the same time. So we need some level of dependency where one thing relies upon or depends on the output of the other. How's that graph worked out?

[00:14:10] Robert Brennan: Yeah. I think the more you can parallelise the better, especially for these very big tasks. A really good example is we have one client that's literally fielding thousands of new CVE announcements per day across their very massive code base.

[00:14:27] Robert Brennan: And what they do is they send out agents in parallel to go solve those CVEs across the entire code base. And if they were trying to do that iteratively, it would just take forever. But being able to parallelise this means they go a lot faster, and means that if one of these CVEs fails to resolve, maybe there's just no way to resolve it, we don't get stuck.

[00:14:42] Robert Brennan: You have 90% might get solved and then you can focus on that 10% that didn't quite make it through. But yeah, very tricky to figure out how to decompose a task so that you can get the most amount out of parallelization.

[00:14:58] Simon Maple: And I guess there's an amount of human to agent interaction here as well. What stage does the human, what stage is there more chat or more preparation? Does the human then just lead it and then just wait for all the results to come in? How much interaction is that?

[00:15:13] Robert Brennan: I think it's very task dependent. And that's when we talk about agent orchestration and the skill that goes into agent orchestration. A big consideration you need to take into account as you use agents to solve a particular problem is where am I going to stick the human in the loop?

[00:15:30] Robert Brennan: So that I can understand that these are working well, that the agents aren't just spending hours and hundreds of dollars or even more going down a wrong path. You want to make sure that there are regular check-ins to make sure that the agents are on the right track, and that you're not shipping anything into production that a human hasn't properly reviewed and QA tested and things like that.

[00:15:52] Simon Maple: And there would be trust issues there as well. In terms of when we are thinking about, "Okay, let's update some potential new CVEs and some dependencies," there's an amount of production code that's being touched, and there's always a danger. My background in Snyk, whenever there was a change, there's an amount of effort in that change that is required that could potentially cause an issue in production.

[00:16:12] Simon Maple: In your experience do you see people onboarding to this type of mentality of "I'm just going to allow these agents to go and do their thing"? Is there a buildup or growth of trust that needs to occur?

[00:16:25] Robert Brennan: Yeah, I would definitely encourage folks to start small.

[00:16:27] Robert Brennan: Dependency updates are a great one that we're actually pretty used to that being automated with things like Dependabot. AI agents allow you to get another level of automation there for when there are breaking API changes and things like that.

[00:16:41] Robert Brennan: There might be a few lines of code that need to change.

[00:16:43] Itamar Friedman: Mm-hmm.

[00:16:44] Robert Brennan: Starting with smaller tasks that are easier for the human to verify, easier to QA, is definitely the way to get your feet wet with this sort of thing. And then as you build trust and as you build intuition for what agents do well and what they don't do well, then you can increase the scale of your efforts here as you start to build that trust.

[00:17:03] Simon Maple: Awesome. So if a developer wanted to get started with this and use OpenHands to do that, what would the way you would recommend they, where should they go to get started, but what would the first step be to build this in?

[00:17:16] Robert Brennan: Yeah, certainly the first thing would be to identify some repeatable problem that happens in your organization, right. Uh, a very common one would be like, we have a bunch of repos that are using Java version eight, which is, you know, way out of support. Mm-hmm. We wanna get everybody to Java version 21.

[00:17:28] Robert Brennan: Maybe we've got a hundred repos that have this problem. I would start by picking one of those repos and using the OpenHands CLI or our web UI to just iterate through the problem yourself. Try to prompt your way to success and see what that takes. And you'll get more intuition for the problem and more intuition for how agents are able to help you solve this problem.

[00:17:48] Robert Brennan: Mm. So I think once you've built up that intuition for the problem by doing it manually and prompting your way to it, then you can use an SDK like the OpenHands SDK to actually encode that as Python or some other programming language.

[00:18:08] Robert Brennan: You can put that intuition into enforcement and build an orchestrated pipeline for achieving that across many different repositories.

[00:18:14] Simon Maple: Something that could be repeated, and that's a really nice parallelization route as well. It's the same thing but just across many, many repos. And I love the idea of learning that by yourself, the more manual way and then trying to scale it out because you learn, as a developer, you'll know "Ah, there's a gotcha here that I wasn't thinking about."

[00:18:27] Simon Maple: If I had scaled that straight away, it would've been a problem. And one final question. What are the nicest scenarios for a developer to start with this kind of thing? Obviously, they can probably kick it off through the APIs themselves.

[00:18:40] Robert Brennan: Yeah.

[00:18:41] Simon Maple: Are there ways in which we can integrate this through triggers, maybe PR requests that have been opened, into all those types of things as well?

[00:18:46] Robert Brennan: Totally. OpenHands integrates basically anywhere developers live: Slack, GitHub, JIRA, GitLab, Bitbucket. Basically anywhere there's information that you might want to pull into an agent, you can just say "@OpenHands, do the thing." And I would say for more advanced use cases, we also have an API where you can build some automation so that say anytime a certain event happens in Datadog.

[00:19:09] Robert Brennan: You can then use that to kick off an OpenHands session via the API and fix the underlying error. So even if that error happens at two in the morning, you wake up to a pull request that fixes it the next day.

[00:19:18] Simon Maple: Amazing. Well, Robert, you've got a session to go to in about 40 minutes, so looking forward to hearing more about that. But pleasure to see you here.

[00:19:23] Simon Maple: Thank you for having me. Cheers.

Workflow Automation

Agentic Systems

Chapters

In this episode

In this episode from QCon, host Simon Maple speaks with Itamar Friedman, CEO of Qodo, and Robert Brennan, CEO of OpenHands, about advancing AI in software development through multi-agent systems. They explore how specialised, role-based agents can enhance code quality, manage context effectively, and automate repetitive tasks at scale, moving beyond singular AI "copilot" models. Discover how these systems can transform code maintenance in the cloud and why designing distinct agent roles, leveraging the right models, and orchestrating workflows are key to optimising developer productivity.

From the QCon hallway track, host Simon Maple catches up with two builders shaping how developers work with AI: Itamar Friedman, CEO of Qodo, on designing multi‑agent systems for the software development lifecycle, and Robert Brennan, CEO of OpenHands, on scaling code maintenance with AI orchestration in the cloud. The throughline: moving beyond single “copilot” usage into structured, role‑based agent systems that improve code quality, tame context, and automate the boring but necessary parts of engineering at scale.

Why Multi‑Agent Systems for the SDLC

Friedman frames multi‑agent systems as a direct response to code quality and lifecycle coverage. Instead of relying on one monolithic agent to “do everything,” software teams benefit from role‑specialised agents—planners, coders, reviewers, and security checkers—that mirror how effective human teams distribute work. This matters most in brownfield enterprise environments where compliance, standardization, and reliability are non‑negotiable. Recent research (including a new Google paper) and benchmarks like SWE‑bench reinforce the architectural choices needed to make these agents productive on real codebases.

Design choices start with execution patterns: agents that run in parallel for throughput, or sequentially for tighter control. Planning versus coding is a prime example—what happens when those agents disagree? Adding an arbitration step (a separate “decider” agent or a deterministic policy) prevents deadlock and ensures forward progress. Friedman emphasises that developing a multi‑agent system is less about stacking prompts and more about assembling complementary capabilities: different agents, different graphs, different contexts, different permissions.

Role Design and Architecture: One Agent vs Many

A common anti‑pattern is dressing a single agent up with multiple prompts and calling it a multi‑agent system. That can work for lightweight tasks, but complex SDLC workflows benefit from truly distinct agents with their own architectures and control flows. For coding, Friedman argues the core should preserve LLM creativity—developers constantly navigate ambiguities and invent workarounds. In practice, a coding agent’s graph leans heavily on the LLM’s generative reasoning, with access to tools like AST analysis or repo introspection.

By contrast, review and security agents should be much more structured. Think explicit checklists of standards and rules, deterministic validations, and narrow tool permissions. Instead of a loose “don’t do X” prompt, a security agent needs a “one‑minus” stance: enumerate exactly what must be checked (e.g., 100 rules), then allow creativity only outside those constraints. This shift from “creative generation” to “rigorous verification” often means the LLM’s role is smaller in the graph, and control logic does more of the heavy lifting.

The choice of model vendor and tool strategy also changes by role. Friedman notes Anthropic’s philosophy—keep tools simple, let the model do more of the thinking—maps well to coding agents. But don’t expect a single model plus a new prompt to excel equally at security review. Different vendors, prompts, tools, and permissioning should be treated as independent dials you tune per agent role.

Context Is “King and Kingdom”: How to Feed Agents the Right Facts

Developers consistently report context as the number one source of poor quality and hallucination in coding agents—by some surveys, 33–80% identify context gaps as the primary pain. Friedman’s advice: treat context as a system, not a single blob of “more code.” Useful context spans the codebase, vulnerability databases, best‑practice and style rules, dependency metadata, and stack‑specific standards. Qodo, for example, maintains a rules/best‑practices database per stack and learns from developer interactions to adapt what to surface.

But more context isn’t always better. There’s a spectrum: too little starves the agent; too much overwhelms it; a noisy middle includes the relevant material but pollutes it with non‑relevant details. The job of a context engine is retrieval and curation—ranking, cutting, and prioritising so the agent sees just what it needs, in the right order. This is the essence of the “context wars”: bandwidth is limited, and indiscriminate stuffing reduces quality.

Equally critical is how an agent uses context. Friedman points out variability across tools; give Cursor the same prompt and context repeatedly and you may see different outputs. That variability can be mitigated by more structured agent graphs—e.g., enforce checklists, verify against rules, and gate steps on concrete signals (tests pass, static analysis clean, policy checks satisfied). In other words, don’t just retrieve better context; make agents prove they consumed it correctly.

Scaling Code Maintenance in the Cloud with AI Agents

Brennan shifts the focus from day‑to‑day pairing on a laptop (e.g., OpenHands CLI or Claude Code) to high‑leverage automation in the cloud. Many maintenance tasks are repetitive and automatable across repositories: dependency management, open source vulnerability remediation, and larger migrations like Python 2→3, Java upgrades, or even COBOL→modern languages. These are exactly the kinds of jobs where agent orchestration shines and where running many agents in parallel pays dividends.

The goal is not to “one‑shot” a massive change, but to encode a repeatable strategy that scales: break down the work, parallelise across codebases, and automate 90% of the path with a human in the loop at the end for verification. Treat it like a production system—schedule runs, observe outcomes, and iterate. Over time, that orchestration layer becomes a factory for tech‑debt reduction, turning what used to be dreaded chores into predictable workflows.

This approach pairs well with enterprise CI/CD: agents open branches, run build/test pipelines, and submit PRs with linked evidence (test results, changelogs, CVE references). Humans focus on oversight—reviewing deltas, approving merges, and handling edge cases—rather than doing the rote work themselves. The productivity gain compounds when you apply the same playbook across dozens or hundreds of services.

Orchestrating Agents with OpenHands SDK and Inter‑Agent Protocols

On the implementation side, Brennan advocates for a unified framework that defines multiple agents—each with its own system prompt, toolset, and behavior—inside a single orchestration layer. OpenHands SDK enables this, including access to MCP servers to expose external tools and data sources, and the ability to wire agents together. While agent‑to‑agent communication is possible via emerging protocols, putting them in one framework simplifies coordination, logging, and control.

A practical orchestration pattern looks like this:

Planning agent scopes the work, proposes a migration strategy, and produces a task graph.
Worker coding agents execute changes in chunks (per package, per module, per repo), using creative LLM capabilities plus tools like AST analyzers and code search.
Review/security agents run structured checks: rulesets, vulnerability databases, style/standard compliance, and policy gates.
An arbitration or controller agent resolves disagreements (e.g., planner vs. reviewer), escalates to a human when confidence drops, and enforces stop/go thresholds.
Integration steps build, test, and package; successful runs become PRs with evidence; failures loop back into the plan.

Crucially, this isn’t just about “more agents.” It’s about role clarity, guardrails, and the right data at the right time. Designers should define explicit permissions (which repos, which tools, which environments), decide parallel vs. sequential phases, and set verification criteria per stage. When that foundation is in place, the choice of model vendor per role (e.g., Claude for coding; a different vendor for structured review) becomes a tactical optimization.

Key Takeaways

Start with roles, not prompts: design different agent graphs for planning, coding, review, and security. Coding agents need creative latitude; review/security agents need deterministic checklists and narrow permissions.
Expect and design for disagreement: introduce an arbiter agent or deterministic policies to resolve conflicts between planner, coder, and reviewer outputs.
Treat context as a product: retrieve from multiple sources (code, vulnerabilities, best‑practice rules), rank and trim aggressively, and verify that agents used the context via tests, policy checks, and structured gates.
Choose models per role: a single model rarely excels at every task. For coding, a model like Claude with strong reasoning and minimal tool complexity can work well; for security/review, lean into structured workflows and possibly different vendors.
Move maintenance to the cloud: orchestrate multi‑agent workflows for dependency updates, CVE remediation, and language/runtime upgrades. Aim for 90% automation with a human in the loop for final verification.
Use a unified framework: tools like OpenHands SDK let you define multiple agents with different prompts, tools, and MCP servers in one place, simplifying coordination, observability, and control.
Build guardrails into the graph: enforce tests, static checks, and policy gates between phases. Measure outcomes, capture evidence in PRs, and iterate on the orchestration plan.
Optimise for repeatability and scale: once a workflow works on one repo, parameterise it and run it across many codebases.

Related episodes

Intelligence ≠ Knowledge: Why Context Beats Bigger Models

13 Jan 2026

with Guy Podjarny, Simon Maple

"Agent Therapy"

for Codebases

VP of Applied AI, Netlify

Agent Experience Is the New Developer Experience

3 Dec 2025

with Sean Roberts

Agents Explained:

Beginner To Pro

Maksim Shaposhnikov

AI Research Engineer, Tessl

AI Agents Beyond Context Limits

28 Oct 2025

with Maksim Shaposhnikov

Workflow Automation

Agentic Systems

Chapters

Related episodes

Intelligence ≠ Knowledge: Why Context Beats Bigger Models

13 Jan 2026

with Guy Podjarny, Simon Maple

"Agent Therapy"

for Codebases

VP of Applied AI, Netlify

Agent Experience Is the New Developer Experience

3 Dec 2025

with Sean Roberts

Agents Explained:

Beginner To Pro

Maksim Shaposhnikov

AI Research Engineer, Tessl

AI Agents Beyond Context Limits

28 Oct 2025

with Maksim Shaposhnikov