
Why 95%
of Agents
Fail
Can Agentic Engineering Really Deliver Enterprise-Grade Code?
Also available on
Transcript
[00:00:00] Simon: Before we jump into this episode, I wanted to let you know that this podcast is for developers building with AI at the core. So whether that's exploring the latest tools, the workflows, or the best practices, this podcast is for you. Now, we've had amazing guests like Olivier Pomel, the CEO of Datadog, Mati Staniszewski, the co-founder of ElevenLabs.
[00:00:27] Simon: Victor Riparbelli, the CEO and co-founder at Synthesia, and of course, Tessl’s very own, Patrick Debois and they're all sharing lessons right from the frontier of our industry. A really quick ask, 90% of people who are listening to this haven't yet subscribed. So if this content has helped you build smarter, hit that subscribe button and maybe a like, it really does help us keep bringing the folks who are actually pushing this space forward and helping build that AI-native future.
[00:00:53] Simon: Alright, back to the episode. Hello and welcome to another episode of the AI Native Dev. My name's Simon [00:01:00] Maple and I'm going to be the host for today's episode. And joining me today is Reuven Cohen. People may know him better as Ruv, and Ruv is an agentic engineer. He's the founder of the Agentics Foundation and also creator of Claude Flow, which you may have heard about, getting a lot of noise and a lot of adoption just recently.
[00:01:19] Simon: So, Reuven, welcome to the show. How are you?
Reuven: I'm great. Thanks for having me today.
Simon: Absolutely. Pleasure. Whereabouts are you calling in from today, Ruv?
[00:01:26] Reuven: I am just outside of Toronto, Canada. And it's a nice sunny late summer day here, so wonderful.
[00:01:33] Simon: Wonderful. Sounds good. Sounds good. Now, agentic, when we think about Ruv, we always think it's agentic foundation, agentic development.
[00:01:42] Simon: With all the things that you've done in and around SPARC and in and around Claude Flow, you've been working with agentic processes for years now. Tell us, let's start really talking about the state of agentic before we move into guardrail, SPARC and Claude Flow. [00:02:00] We'd love to talk about the basics.
[00:02:02] Simon: What is agentic coding? What is the core difference between agentic coding and non-agentic coding? What makes the difference?
[00:02:10] Reuven: Well, it's been a bit of an evolution for us in the space. To tell you about agentics, I have to tell you sort of the brief history of agentics, and for some people it’s a long history.
[00:02:23] Reuven: You know, the idea of agents in technology has been around for almost as long as there have been servers and computers, where you'd have systems that essentially ran autonomously. So the idea of an agent really speaks to the ability to operate with little to no human oversight.
[00:02:43] Reuven: Now, a lot of the early agents you would see in applications, things like networks and other components, were very procedural. They would run specifically within a very confined structure. You know, one plus one equals two. But as soon as you went outside of that deterministic perspective, they fell down.
[00:03:06] Reuven: They weren't smart, they weren't capable, they just did specifically what they were programmed to do and nothing else. And what we saw really with the emergence of the GPT models, really GPT-3 and beyond, was the ability to use language alongside these systems that were then able to use the tools, the infrastructure, and the environment in new and interesting ways.
[00:03:30] Reuven: And the major breakthrough we had probably around two years ago was this idea of a recursive loop. Now a lot of the early agents and prompt engineering were focused on a kind of chain of thought, right? Do these things and execute based on a kind of language abstraction.
[00:03:51] Reuven: But it turned out what really made these systems work well was recursion, feeding errors, logs, successes, and failures back into the system so the system could understand the context of what was happening. Once we determined that the context of what was happening was more important than what was happening, we could then solve the problems.
[00:04:16] Reuven: If there's an error in code, which almost always happens, you would be able to resolve those errors because the AI or the agents themselves could understand what those errors are and resolve them through a feedback loop. So here we are, 2025. The agentic engineer didn't even exist probably three years ago.
[00:04:40] Reuven: And one of the things, as we started building a lot of this technology, is we ended up seeing other like-minded individuals coalescing around myself and others. We had Reddit groups, WhatsApp, Discords, and then we saw large companies adopting the same terminology, agent and agent engineering.
[00:05:01] Reuven: So it's been a kind of wild ride over the last few years here.
[00:05:05] Simon: The idea of this iterative process, whereby we try something, if it doesn't work, we feed it back in and loop this multiple times before it provides an answer.
[00:05:19] Simon: What do you feel made 2025, or was it 2024, the year of the agent? When we talk about the year of the agent, is it the ability for humans to trust AI to do these things by itself? Because when we think about how developers would use AI and LLMs in their coding, it started very much with autocomplete-style suggestions from an assistant.
[00:05:48] Simon: And then we gradually allowed it to do more, whether it was cursor or other tools that kind of like did much more for us across multiple files and things like that. What, what was the thing,, do you feel that really kinda like allowed us to turn agents on in our code generation?
[00:06:06] Simon: Is it context, is it trust from humans in the process? Is it the capabilities of LLMs?
[00:06:12] Reuven: I don't think it's trust. I think anyone that's building these systems inherently doesn't trust the output and needs to spend quite a bit of time verifying that what it's actually given us is true and valid and functional and not just a kind of mock or simulated version of what we asked it.
[00:06:28] Reuven: So there's a tendency for the system to, you know, kind of take the shortest path to give you the outcome. So a lot of what an engineer is doing is working with solid structures, architectures, and other things to sort of validate what's being built is functionally true and correct.
[00:06:47] Reuven: Now, to your question of what makes 2025 the year of sort of agentics, if that's even a thing, one of the big breakthroughs last year was again this idea of recursion. [00:07:00] So, a group of us created these systems that could operate on a long horizon. These are agents that could run for hours or days at a time and then eventually complete whatever problem they were given.
[00:07:13] Reuven: Now, the drawback to that, we could run it for, I think, a 36-hour test, mostly just to show that I could run an agent for 36 hours. But the costs were prohibitively expensive. And what we saw as we scaled these agents is that to effectively run this, you're looking at even on a minimal basis around $4,000 a day.
[00:07:36] Reuven: And when we really cranked it up to dozens of agents running concurrently, we were looking at $7,500 US an hour to run a 10-agent swarm concurrently. And it is just substantially cheaper to hire a person at that point to do that work.
[00:07:56] Reuven: Even though you could do a ton of output, [00:08:00] most applications didn't require that level of scale or capacity. So in April of this year, what we saw was, anthropic and Claude Code came out and they started offering a sort of unlimited, all-you-could-eat buffet for tokens and capability.
[00:08:18] Reuven: And the first thing we did is we reimagined the previous iterations of what we call the SPARC protocol, which allowed for that recursion, and suddenly we were able to spawn these swarms that could run for hours on end and cost a flat fee, often starting at $20 for an entire month, which went from thousands of dollars to $20, thousands of dollars an hour to $20 a month.
[00:08:45] Reuven: So we saw this interesting inflection of both capability and cost. Suddenly, we were able to build things dramatically faster and with a lot more capability at an exponentially lower cost. Which is an interesting sort of, you don't generally see that happen all at once, and that was basically May.
[00:09:03] Simon: Yeah.
[00:09:03] Simon: And that's super interesting because I think it's both making it more affordable but also the parallelization. It really opens up the quality of what we can actually get back as a response. I want to talk about that in just a second before we move on to swarms and SPARC specifically.
[00:09:21] Simon: I'd love to talk a little bit about the foundation that you created, the Agentics Foundation. First of all, the Agentics Foundation isn't super new, I guess. Tell us a little bit about what the need was for that foundation. What's the problem it's trying to solve?
[00:09:40] Reuven: It is an organic sort of evolution of the work.
[00:09:44] Reuven: And I'm going to tell you a little backstory, which is probably going to make me sound absolutely crazy, but I'm going to say it anyway. I was early into chat GPT. I was lucky enough to be an early beta tester. I think they called it a VIP program back in 2022 with OpenAI. And, you know, [00:10:00] I basically got it maybe three or four weeks before everybody else.
[00:10:03] Reuven: One of the first things I asked it is, and this is going to sound really egotistical, how can I be the most influential person in AI? And it gave me this step-by-step process that said, first of all, what are you good at?
[00:10:21] Reuven: What do you like and where do you want to be in a few years? And it suggested that anything worth doing takes years of prep to get to that point. It gave me this outline based on a kind of Q and A that you'd have with chat GPT, and most of you who have used chat GPT know the kind of asking open-ended questions routine.
[00:10:42] Reuven: And it said basically you're going to have to build a social media following and you should be narrow in that focus. I asked, what does that actually mean? It said, what do you like to do? I like to build autonomous things. I've been building cloud infrastructure for nearly 25 years.
[00:11:00] Reuven: And it set me down this path and said, first of all, you need to create a subreddit. Now I know nothing of Reddit or subreddits, but I did it. I followed the guidance, and three and a half years later I've got a hundred thousand plus subscribers on the subreddit. I'm not really sure exactly why or what I'm going to do with it.
[00:11:20] Reuven: It's, but it's popular. Then it said I needed to, and the key was consistency. It told me that I needed to build a following by constantly posting interesting things I'm doing and sharing those publicly on my GitHub, creating weekly livecasts where I can interact with people.
[00:11:40] Reuven: I followed the guidance, and next thing you know, I have thousands of people showing up for these livecasts, following me, and essentially buying my time on a kind of OnlyFans-for-geeks approach where people can essentially buy my time on an hourly basis.
[00:11:57] Reuven: And the craziest part of the whole [00:12:00] story is, it actually worked. I'm in a fortunate position where I can claim more than a hundred customers, 20 Fortune 500 clients. I'm literally one guy, one company. My wife helps with some of the non-technical parts, but generally speaking, it's me and my bots.
[00:12:21] Reuven: And the entire business plan and process was literally set forth by chat GPT, which is in itself just amazing to me.
[00:12:30] Simon: Did the LLM tell you to create the foundation, or was that something that you’ve..
[00:12:33] Reuven: Well, the foundation was an evolution of those early conversations. No, it didn't exactly say that.
[00:12:39] Reuven: And what happened was, as the community coalesced around the concepts of agentics and the practical approaches we were taking to implement the agentic systems, it became pretty clear that we were in the midst of a new profession. And when you look historically at different professions, and let me be honest,
[00:13:00] Reuven: AI gives you delusions of grandeur often. But in this particular case, there was a group of us realizing that we are all practicing a similar approach to engineering AI. When we looked at it, we saw a spectrum. On one end we had these vibe coders.
[00:13:19] Reuven: These vibe coders are not, I'm not saying vibe coding isn't important. It's a great ideation, learning, and discovery mechanism, but it's very freeform. It's very fluid in what you're doing without much of a plan. What we were doing was different. We were creating architectures, processes, and approaches that were repeatable with a defined outcome.
[00:13:41] Reuven: That was an engineering activity. What we saw was the term agentic and agentic engineering was starting to be co-opted by large companies who were essentially AI washing the concept. They were saying it was an agent, but it was a chatbot or a vibe coding system.
[00:13:59] Reuven: And we saw this as an opportunity for us to say, this is our approach. This is what we believe an agentic engineer as a professional, or as a profession should encompass both technically and as a sort of guild for our group of engineers. We are the stonemasons for this new emerging field of agentic engineering for AI.
[00:14:24] Reuven: And the group formed around that. It's open. It's the sort of antithesis of some of the corporate-led open groups, OpenAI sort of being the other end of the spectrum, dominated by corporate interests. We said we need to make this for people to empower them professionally, and maybe there's some aspirational, but as the society around AI takes shape, what does that look like?
[00:14:50] Reuven: And how do we protect the interests of the people ultimately that are gonna be affected by it most? So we act, we want anyway to act as a kind of hedge against trillion [00:15:00] dollar companies. If we can do that, and that was formed in March of this year. And next thing we know, we are in something like 60 cities.
[00:15:07] Reuven: We've got multiple events, sort of organic startups or meetups all around the world happening every week. And,
[00:15:14] Simon: And people can just join that. What’s the best way for people to get involved?
[00:15:18] Reuven: Well, a lot of the communication and community happens around either our Discord channel, discord do agentics.org or WhatsApp. WhatsApp's maxed out.
[00:15:29] Reuven: So I'm not gonna give you the address because when you hit a certain threshold, you basically can't add anybody. But most of it, we've shifted to Discord at this point. And agentics.org, it's a member-led organization, meritocracy, if you will. So if you want to get involved, create a new chapter in a different city, or do a meetup, basically, you just volunteer.
[00:15:52] Reuven: We make it happen.
[00:15:53] Simon: So if we were to look at the state of agents today, what would you say are the things that are creating that ceiling for agentic today? What are the biggest things causing agentic development to fail, or at least causing problems to developers trying to use agentic development today?
[00:16:14] Reuven: The limiting factors of the space right now, and there was an interesting report from MIT a couple weeks ago, that basically said 95% of agentic projects fail.
[00:16:27] Reuven: It was a fairly sensational title. When you dig into that a little bit, there are two ways to think about those types of stats. One, 95% of projects fail, which is probably true, but the reason for that failure is likely 99% of engineers, programmers, developers, project managers don't know how to actually build agentic systems, and it's a byproduct of a new emerging space.
[00:17:02] Reuven: And it, so the limiting factor is the fact that it's really hard to look beyond the agentic washing of products and the people associated with it to determine what the capabilities of the products and the people implementing these products really are.
[00:17:21] Reuven: So when you see this high failure rate, it speaks to the fact that we're in a new emerging space. If you're old enough to have been around at the beginning of the internet, you would've seen the same thing with a lot of the internet projects that corporations took on back in the late nineties.
[00:17:37] Reuven: And there was this idea that the internet wasn't gonna cut it for most business-type applications, and then they were wrong. It was the fact that we didn't, as an industry, understand exactly how to build user-friendly internet-based applications. Over time, there were models to follow.
[00:17:55] Reuven: Amazons and the eBays showed up, and we were like, okay, this is how you create a web-based service that people can easily interact with. It's not just recreating software. The same problem, 25 or 30 years later, in the Genix and AI space, there's a tendency for people to build applications the way they've always built them, with human-centric models, review cycles, long drawn-out sprints, and traditional tactics optimized for a world where we built slowly over time.
[00:18:29] Reuven: Now we're in a world where we can literally copy anything anywhere at a moment's notice. The quality of the code, although important, is less important than the momentum you get in terms of speed and time to market. Companies embracing this don't just replace their developers; that might be a byproduct, but it augments them in ways those developers were never able to do before.
[00:18:58] Reuven: So you're empowering them with a kind of superpower to create much more effectively and quickly, which creates a variety of secondary problems. But ultimately, this is the empowerment of developers to do more with less.
[00:19:17] Simon: And a lot of the ways that can be done in a more predictable way is to use rails or guardrails. One of the things that came out of the foundation, I believe, was SPARC, S-P-A-R-C. Talk us through what was the driving force behind SPARC. Was it pain in terms of seeing how people, or yourselves, were developing code and what different perspectives you needed from that?
[00:19:54] Simon: Because I know it's very specification-based, but there are also a number of different things beyond just specifications, which are important to allow your agent processes to build in a predictable way.
[00:20:04] Reuven: And what we saw, if you rewind about a year and a half, a lot of the early work within these coding systems, even before agents became popular, were limited by the specifications themselves.
[00:20:19] Reuven: And a lot of the development was happening on a function-by-function basis. Fix this, copy and paste this into the system. As you said, it kind of auto-complete, which was fine. That was not a bad starting point. But once we start looking at creating larger, more complex applications with interconnected relationships, data structures, user experience components, and infrastructure, it became important to create a specification that the AI could follow, implement, test, and verify.
[00:20:49] Reuven: Part of that was an homage to some of the work I did 25 years ago with Sun Microsystems and their SPARC processor. At least the SPARC with the C, and AI is pretty good at creating acronyms. As anyone can see, the RUV stands for something new every other day for me.
[00:21:07] Reuven: But ultimately, the specification allowed for a kind of specification that said, here's what I'm building, a PRD, if you will. Rather than building the entire specification, what I ended up doing was creating a phseudo code outline. I'm not trying to build the entire code.
[00:21:25] Reuven: Obviously that's gonna happen at a later stage. I wanted to create scaffolding that allows the AI to understand the general structure of what the code needs to look like before it implements it. The second stage was basically a rudimentary scaffolding of all the parts of the application.
[00:21:42] Reuven: Then once we had that, we had a defined architecture based on the code. Once we had that, then we could go through a refinement phase. That was the iteration. The refinement provides a mechanism to say, now we have a pseudo code outline and a defined architecture, [00:22:00] which includes all the various parts of the application.
[00:22:02] Reuven: Now we can actually start building that. The refinement turned out to be a really interesting part when we got to the point of a parallel or concurrent development process. So now that we know exactly all the parts of the application, we can get different agents to work in different parts of the overall application independently of each other, although connected through the architecture itself, using a test-driven development approach, which is the completion, the test-driven development approach.
[00:22:30] Reuven: And I'm gonna give you a quick background. For those who aren't familiar with test-driven development, there are essentially two different approaches to test-driven development. The traditional test-driven development approach, sometimes referred to as the Detroit school, which is essentially creating the test as you create the code.
[00:22:45] Reuven: And that's what most test-driven development today does. You have a function or a part of an application, you create a test, you write the code, it passes, it fails. That was great for people, but it's not great for autonomous agents building things. The other approach [00:23:00] is referred to as the London School of Test-Driven Development.
[00:23:04] Reuven: Now the London School of Test-Driven Development uses scaffolding. Developers hate it because you're basically creating the entire application several times. One, you're creating the entire outline of all the parts and how all the parts work before you ever build the application, much like what I defined in the SPARC architecture.
[00:23:23] Reuven: Then you're building the application, it fails, it always fails, then you do it again. So you're literally building the application at least three times, possibly four times. But what's interesting about this approach when you have AI agents is that they don't actually complain about having to do the same work four times.
[00:23:42] Reuven: They, you know, they're AI, and you're doing it really, really quickly and concurrently. So what would take months or many weeks now can happen in an afternoon. And that provides, again, a guide for the AI to follow. Once all those tests pass, generally speaking, assuming there’s [00:24:00] there, the application works.
[00:24:02] Reuven: Different languages work better than others. A lot of people ask, what's the optimal language for this? Well, you know, there's Python, node, JavaScript, TypeScript, things like that. But the problem with those languages is they're permissive, meaning that these languages allow you to build with a lot of flaws and they'll basically work even if they're not right.
[00:24:28] Reuven: And then you have languages like Rust, which are very explicit in their definition, deterministic. It's really hard to build bad things in Rust because it just doesn't work if it's not built correctly. By applying a rigid specification to a less permissive programming language, the combination gives you the ability to build really complicated, broad projects with reliable results when you're done.
[00:24:55] Simon: Yeah. And let's dig into a few of those pieces in a bit more depth. 'Cause, [00:25:00] you know, there are so many interesting things that you kinda just outlined there. First of all, with specifications, when you look at your specifications and talk about the capabilities, are you effectively adding high-level test cases as part of the specification?
[00:25:17] Simon: Or is that something that is really done at the end?
[00:25:20] Reuven: I do it at the beginning. So I'm right at the beginning. I do it at the beginning. When you follow the Spark spec, there are different ways, and a lot of people will massage that to their own needs, but basically what you're doing is creating requirements.
[00:25:34] Reuven: The, you know, what does the product need to do? How does it need to solve the problem? What are the algorithms, what are the flows, the data structures, the authentication, you know, all the things you'd want in the application. Then what you're doing is saying, how do I validate, how do I test that this thing actually does what it's supposed to do?
[00:25:48] Reuven: That is the biggest challenge you'll run into in any of these automated systems. At the end, it says it's done, and it's not. It's a convincing forgery. [00:26:00] It's a fraud. Anyone who's ever let an AI run for a while and looked at it would see this isn't true, this isn't right.
[00:26:09] Reuven: And that, that's a byproduct of a specification that doesn't have the tests adequately defined to define what is true and what is not. Right. So when you integrate that as part of the development flow, you can, sort of mitigate the potential for fake or fraudulent sort of result.
[00:26:27] Simon: Yeah. And, and it also feels like it's a far easier thing to describe when you're talking about the specs and you're talking about the behavior, you're effectively describing tests.
[00:26:36] Simon: It's like, it’s hard to do one without the other. You then mentioned pseudo code, which is really, really interesting. When we talk about tests, we talk about specs. We are constantly thinking, okay, should the specs care about the implementation at all? How much should it go deeper into the implementation, or how something should be created?
[00:26:59] Simon: And there's always a [00:27:00] fine balance between the, what and the how. And it sounds like with the pseudo code, it's interesting that, kinda like, doesn't necessarily provide language specific, information, but it does provide that ability to be able to define parts of the infrastructure.
[00:27:14] Simon: Define how I architect this application to be created. Couple of questions. First of all, while that provides the LLM a better way to actually create implementation, do you find there's a balance between how accurate it can be with the implementation and how prescriptive you're being with the LLM? Does that provide benefits as well as drawbacks?
[00:27:44] Simon: Or do you find that's the best-of-both-worlds solution?
[00:27:50] Reuven: I think the prescription, the prescriptive, I should say, is a good description of it. What you're trying to do is the more narrow the task, the better the result. So when you look at, you know, agent-based development, a lot of times people take the analogy of replacing a developer or a persona.
[00:28:12] Reuven: It's not like that. You're not, they don't operate like people at all. And this idea that you're trying to replace, you know, traditional human tasks and development approaches with agent-based approaches just doesn't really work. What you need to look at is like a multi-threaded development process where each thread is specifically tailored to a particular narrow problem you're trying to solve within a much larger group of threads.
[00:28:38] Reuven: And so one thread might be literally one function that calls a data or CLI or some other particular part. Once that's done, it's functional. An application is made up of many threads that are working concurrently to solve that being tested. And as you say, it's prescriptive in how it's defined.
[00:29:00] Reuven: Does it actually do what it's intended to do? Did I do a good enough job of defining what I wanted originally? Sometimes I don't. Sometimes you're not really sure what you want to build. And that's also quite exciting, to explore the unknown and see what I can build, kind of in a more vibe style.
[00:29:20] Reuven: But that said, I still need to be prescriptive in the approach and the tactics I'm using to build those things. So this led to things like Claude Flow, which allowed me to share my development approach in a way that was open enough to be interpreted and used in a variety of different architectures, methodologies, and applications, but narrow enough to do so in a way that allows people who don't necessarily know all the things they don't know to do it in a way that wasn't gonna result in a bunch of spaghetti that doesn't work.
[00:29:55] Simon: Yeah, absolutely. And I guess the second piece to this, when I first started playing with Kiro as a good example, one of the first things I noticed was when they use specifications, the lifecycle of the specification tends to be the change they're trying to make.
[00:30:13] Simon: So if I were to make a change to an existing application, I would say, this is what I'm trying to do. It would create a specification based on the changes it needs to make. It would create a design style spec as well, and a list of tasks it needs to do to enact that change.
[00:30:32] Simon: If I were to then carry on, I would probably create a new specification for a new change. The lifecycle is based on that change, whereas what I'm hearing from you is more that the lifecycle of the specification is the application. Or is it the specification? Is that a fair representation of how you think of specifications and how Claude Flow works with specifications?
[00:30:59] Reuven: One of the things I'm doing, when you look at my development, is I'm using, so a lot of these systems are based on the idea of Markdown. You would create a docs folder, plans folder, and create specifications. Those specifications would define multiple phases where you are in the implementation strategy, and it would iterate through those phases and Markdown pile.
[00:31:22] Reuven: Now the issue that we faced when looking at that approach was the fact that it lost context. It wasn't contextually aware of beyond the last MD file or if it happens to look at the other MD files, where it was in the actual development process. So what I ended up doing was using the approach of a kind of shared memory space.
[00:31:44] Reuven: And I made it simple. A lot of people were focused on using vectors and complex sort of similarity up, you know, searches. It turns out you don't need any of that. Turns out what you really need is a common simple SQL light storage environment that allows all the agents to save their current state as well as access the previous state in a way that was easy for any agent participating in the swarm to see and interpret.
[00:32:12] Reuven: Then that opened up two general directions. One was a swarm. A swarm uses the concepts of independently operating agents or sometimes I call 'em threads. But these agents were essentially operating with a common direction, kind of orchestrator of sorts that would tell them generally what to do.
[00:32:32] Reuven: And then each agent would be sort of autonomous in the sense that they would be able to make their own determination how to solve that problem. And then they'd update the status and through that sort of shared memory. The other approach which uses the concepts of a hive mind, which sounds straight up sci-fi, creates the ability for each agent to share.
[00:32:53] Reuven: The exact context of what's happening at any given point, rather than running autonomously, which I would assume like an ant colony, it might be an example. Each ant has a task. They’ve got leadership in and are orchestrated through the pheromones of the queen, but they're autonomous entities in a sense that they don't share, at least I don't think they do share conscious constructs between each of them.
[00:33:17] Reuven: The hive mind allows you for a common, I'm gonna call it consciousness, it's probably the wrong term, a common sort of mind and understand what's happening at any given point, and then adjust based on, it solves different problems co totally different development tactics. The hive mind approach is generally better for broad early development.
[00:33:37] Reuven: And, you know, you're gonna build a bunch of things quickly, you know, hundreds of thousands of lines of code, and you want it to sort of all more, be more integrated in the sense. Swarm is a little more targeted. So I think of it like a sculptor might. You get a big, giant piece of granite, you're blasting away at it at the beginning.
[00:33:56] Reuven: Hive mind when I'm getting, when I need more narrow, more details, I need to do the nose and ears and the eyes. I'm gonna use swarms and specific parts of that to really target the different parts of the thing I wanna build. I'm building the face, I'm not building the body at this point. The swarm's great for that.
[00:34:12] Reuven: So you change your tactics based on where you are in the development of your application and the tools. Yeah.
[00:34:20] Simon: Yeah, absolutely. And I guess in terms of, you know, this thinking of if I'm gonna make a change or if I'm gonna add something to my specification, I don't want it to necessarily regenerate everything.
[00:34:33] Simon: Is the hive mind also good at kind of like recognizing what needs to be regenerated, what doesn't need to be regenerated, to make sure you almost preserve that history of context of what, how everything else has been built. What does that workflow look like?
[00:34:50] Reuven: It's adaptive, which is the most interesting part of these things.
[00:34:56] Reuven: They're given a level of autonomy and a declarative approach to solve those problems based on the requirements of the overall project, the current state of the implementation of the project doesn't work. What's left to do, and a sort of human in the loop that guides that process. And ultimately, what we're doing here is a kind of, I don't know, it's gonna sound crazy, a symbiotic relationship between the AI itself and me, the sort of developer, the swarm master, whatever you wanna call it, that helps guide through the steps.
[00:35:30] Reuven: What I'm not generally doing is letting this thing run for six hours, then, you know, sometimes I do, but more I'm doing sprints, I'm doing hour or two, letting it solve the hard problems, ideating, optimizing, you know, and then I'm reviewing, then I'm moving on to the next phase. And I'm the reviewer and that's an important part of the process.
[00:35:53] Simon: And as part of that, if you asked for something and then it goes away and actually completely rebuild something, as long as the past the test's passed, do you care or is that something that you would consider risk?
[00:36:05] Reuven: I don't trust anything. I validate and verify everything that happens.
[00:36:10] Reuven: Just because it tells me the test has passed doesn't necessarily mean I believe it. And it often now that I've got, I've gotten pretty good at doing this when the test passed. It means it is generally right, but I assume it's wrong and I need heavy level of proof to validate that it's built it in the way that I like, that it's secure, that it's scalable, that doesn't hard coded any nvs in the source code.
[00:36:34] Reuven: And one of the other challenges that we have with a lot of these systems, Claude Code in particular, is the context engineering itself. It starts off great and after, you know, 45 minutes of doing whatever it's doing, it's compacted the conversation a few times and it's lost its context. So the reinjection or reminding a kind of power steering of sorts for the system is really important.
[00:36:57] Reuven: And I think tools like, what was it, root code did power steering where you could sort of have a known reminder of where it is every now and again, works really well. And these systems need to be reminded of its purpose and focus.
[00:37:12] Simon: Yeah. And this is something that actually, that kind of Claude Flow does very, very nicely as well.
[00:37:19] Simon: I'd love to kind of jump into Claude Flow now. Tell us a little bit about, this is something, a project, a tool that you created. When did you first release, actually it was fairly recent in the last couple of months.
[00:37:32] Reuven: June 15th is, it was June. June 15th. Yeah. It was funny. And there's a few things that I learned, I've been on an epic development, sort of sprint over the last two months and I've lost count, but last I checked it was like several million lines of code and a lot of different rough projects.
[00:37:51] Reuven: So the first version was essentially I discovered something called the Batch Tool within Claude Code. The batch tool essentially was an undocumented part of their system that allowed me to move from a sequential implementation of the agent. So if you ever use cloud code or a lot of these other systems, you'll notice that it does task finish the task, moves the next, and moves the next.
[00:38:12] Reuven: What the batch tool allowed me to do was create sub. Which interestingly enough, anthropic adopted the same term. Think you no, no credit to me, but whatever. And then, the subagent essentially allowed the threats. Each of these things could run in a parallel form, all at once. So rather than sequentially, I could spa 10 or I think at 1.5 hundred of these agents.
[00:38:36] Reuven: generally run in smaller numbers and odd numbers wherever possible. And I'm not sure why odd numbers work better, but they do. And essentially what that means is they're doing a task and another doing another task and then they can all run at the same time, which took a project that would take me three or four hours and I can now do in a half an hour because I had 10 agents running concurrently rather than sequentially over three or four hours.
[00:39:02] Reuven: So the speed was dramatic. It was basically a linear increase in sort of capability or I guess decrease in time. So 10 agents is 10 times less. You know, and, but there was also a sort of max, I noticed that I really wasn't seeing much improvement beyond around 20 agents for things, unless it's maybe doing like machine learning, learning models and optimization.
[00:39:27] Reuven: But for most development-centric things,it kind of maxed out around 10 or 15 agents and I wasn't seeing really any improvement beyond that. And at which point, I then said, okay, I think this is something that others would want. And because I was doing a lot of work in Rust, I concurrently developed a micro neural network architecture.
[00:39:47] Reuven: Where I poured it, one of the first projects I did in the V1 of these sort of batch-style agents was basically porting existing complex libraries into Rust. I'm like, what's the most complicated thing I can possibly build? That's literally what I was thinking. And the first one was the fast neural network project and which I think is referred to as FANN sometimes.
[00:40:12] Reuven: I poured that in Rust, made it a wasm, a web assembly component that could run in the browser or on CLIs or servers without any requirements. So it's portable, it's light, and it was around, I don't know the exact number, but substantially faster than the traditional C and C++ implementation.
[00:40:30] Reuven: I packaged that up, put it into Claude, into a MPX. So I can deliver that through a common MPM or MPX delivery. So if you ever installed different modules for TypeScript or Node.js, you do the MPX. So I can deliver these really complex Rust-based components as a TypeScript module, meaning I can get it to anyone anywhere pretty much.
[00:40:54] Reuven: And I created the first version that combined the idea of an adaptive neural network along with a swarm component. And I called it Claude Flow, launched it, and suddenly within a matter of hours, I had tens of thousands of downloads of this thing. And we were sort of inundating anthropic and I don't think they were particularly happy about this.
[00:41:18] Reuven: So within about a week, I had about a hundred thousand downloads. And we were doing, you know, just billions and billions of tokens through the system. And I'm fairly certain we were a large part of why they revised their model. I'm sorry, it wasn't the plan, you know, but they said unlimited.
[00:41:37] Reuven: So in fairness, I was taken advantage of that. And then that led to a sort of recursive development where I could improve the things I was building even faster, because now I had this adaptive neural network that allowed me to optimize specific kinds of synaptic components. So micro neural networks that were interconnected to each other and self-optimize.
[00:41:59] Reuven: So where traditional neural networks were monolithic, you'd have billion-plus parameter models. What I was doing was creating very narrowly focused neural networks that were explicitly built for a task: trade this one stock in this certain way, understand a particular customer segment or whatever it needed to do.
[00:42:18] Reuven: And so I was trying 10,000 or maybe a hundred thousand parameter neural network models and then applying it to the actual application to solve the problem. So we can get really, really smart at that one thing, trade that stock better, and I didn't need a GPU 'cause I'm using sub hundred-thousand parameter models, no GPU is required.
[00:42:37] Reuven: I can create a neural network in seconds based on the Rust library that does the one thing and no GPU required at all. Which basically changes the paradigms from a monolith to a distributed sort of synaptic network of many small interconnected neural networks that all can collaborate with each other, which is straight-up sci-fi.
[00:42:58] Simon: Yeah, no, absolutely. Walk us through if you can, or show us if you can, the Claude Flow in action. So I'd love for the folks and listeners to almost like see the use cases. Would you say it's more for people who are creating from scratch or would you say it's more for people who are trying to convert existing applications into a spec, more spec-centric approach?
[00:43:25] Simon: What would you say is the core use case that people pick up Claude Flow?
[00:43:31] Reuven: It depends. I’ve been working on a new gamified version of the system that allows users to learn how to use it by participating in challenges and different types of gamified versions of it. So, you know, it's broad.
[00:43:45] Reuven: It's for everyone from early users to master agentic engineers and hopefully everything in between. Let me give you a little demo here. Hopefully. So in a second, what I'm gonna show you here is my Flow Nexus development environment, which is a new project I've been working on.
[00:44:04] Reuven: Hopefully I'll launch it later this week. Let's see here. What I'm showing you right now is a code space running in GitHub. When you run swarms, remember you're running in a kind of dangerously skipped permissions mode within Claude, which means you probably don't wanna run this on a local environment.
[00:44:21] Reuven: You can, if you're using Windows WSL and Mac are probably a little more forgiving. I'm running in Ubuntu code space on GitHub in this case. Now to install, as I mentioned before, I'm using an MPX or an MPM. Now you can install either way. An MPX for those who aren't familiar allows you to run in a kind of remote instantiated version.
[00:44:43] Reuven: So you're always getting the most recent version, an MPM style install, which looks something like this, installs it globally, but doesn't update it automatically. As you see, you don't get a moment in time. So generally what people are doing is they're running an MPX and to install it, all you're doing is running claude-flow@alpha.
[00:45:05] Reuven: This alpha part basically means you're getting my Alpha version. It's alpha until I'm confident that there's no bugs or I stop developing, I suppose. But you're getting the alpha version at that point. Now to verify all, all you're gonna do is write version. This is going to instantiate, you see I'm running Alpha 91.
[00:45:24] Reuven: Now once we have that, we can run the help command. So this is CLI, but the CLI is actually optimized not for you to use directly. And this is the part that sometimes people get confused by. Most of the capabilities of the system are actually optimized for the system itself to use.
[00:45:47] Reuven: So for example, if I run a command to update the memory, I could invoke that command directly. In this case, I'm going Riverside sharing. So I'm gonna update the memory. I'm just gonna type this command session, generate summary, persistent. This is, but the AI is intended to do that.
[00:46:05] Reuven: So you can see here, I've been running this for 9,700 minutes and I've done 559 edits. So this is basically giving the context, but if I want to actually build something, I've got a couple different ways to build. I've got my wizard, so I can go, you can see here that I've got different options.
[00:46:23] Reuven: So I've got training, I've got verification. There's a ton of stuff in here. So I can start, I'm just gonna run the wizard this time. So I'm just gonna copy this and this is gonna guide you through the process of running the wizard. In this case, I'm just gonna go and choose Alpha, and I'm gonna do the hive-mind wizard.
[00:46:42] Reuven: And this guides you through different options for creating swarms, viewing the swarm status. So these are the more human parts of the interface. So I'm gonna create a new swarm. I'm going to impress the live cast viewer's optional name. I'm just gonna let it do whatever it wants.
[00:47:03] Reuven: Now I've got different approaches for the Queen. Interestingly enough, she named herself. I didn't, she named herself Serepina. Which is interesting. I did a little, my wife actually, when I told her about this thing, my swarm gave herself a name, but she's now her mascot.
[00:47:21] Reuven: So I'm gonna choose strategic, I'll say seven agents. I can choose which agents I want. And then I can go and choose how the agents come up with their consensus for making decisions. So I can choose different types. I generally use majority. So if I've got a seven-agent swarm, four agents will determine the direction.
[00:47:45] Reuven: When you are generally use odd numbers, even numbers for
[00:47:50] Simon: Yeah. You mentioned that before. Why do you think that is?
[00:47:52] Reuven: Well, even numbers just, two outta four or five outta 10 just doesn't really work for consensus.
[00:48:01] Simon: It's funny how when you actually look at nature, you see so many odd numbers or Fibonacci, for example, just being used very naturally throughout nature's choices.
[00:48:10] Simon: So it's really interesting that odd numbers seem very poignant to me that it naturally chooses odd numbers or works better with odd numbers.
[00:48:20] Reuven: Yeah. It seems to. So I'm just gonna do majority auto-scaling. Sure. Why not? And then I'm gonna launch dashboard.
[00:48:27] Reuven: No, I'm not gonna bother that. It's still building that. Oh, I had an error. I'm in my dev environment, so no worries. We're gonna invoke that anyway. So in this case, I'm going to spawn, and then I can give it a title, create a five-agent swarm to research swarms.
[00:48:58] Reuven: And then I'm gonna add the Claude to invoke the CLAW system in this case. I'm gonna go right here. Oh, I'm having some trouble with that. I'm gonna use a swarm instead. That's the problem with showing you a dev environment. All right, so we're gonna go here and we're going to do swarm.
[00:49:17] Reuven: I'm gonna use the same, so there's two modes. The Hive Mind, which apparently is having an issue at the moment, and the swarm. So in this case, I'm just gonna do that for this one. Hopefully that solves that problem. And there we go. So now we're launching an agent. You see that I injected a bunch of information.
[00:49:37] Reuven: This is basically telling it the MCP tools to use and where we've got different options. Now, I'm not getting to do anything. I'm just showing you the process of spawning the swarm itself. So now it's initializing the swarm. We'll see that it'll make use of the CLI and MCP components on the fly.
[00:49:59] Reuven: [00:50:00] And so here we are, dev environment. I should have probably prepped a little bit more, but this is trying to invoke the swarm types. We'll see. It's self-fixing. So, we'll see what happens. There we go. It figured itself out. Now when you see these, this indicates that it's an agent.
[00:50:20] Reuven: So we have color-coded agents. You can modify all these things by going into the agent folder and Claude here, or the commands, which are the slash commands, and you'll see that I've got all the different commands available to it. So it's invoking its own persistence. It's creating the task, it's using the CPS and CLI to do the sort of coordination before it then launches the swarm itself.
[00:50:46] Simon: And when you talk about the swarm, would you use the swarm primarily because you wanted to have different roles and different perspectives, or do you see it more as a parallelization task where you're effectively trying to get 10 things running out in the space of one?
[00:51:04] Reuven: It depends what you're trying to do. So in this case, I gave it a pretty generic sort of guide. I said it to research swarms, which isn't really telling it to do much. So more just to invoke it more than anything else. Depending on what you're trying to do, you can have different processes that allow it to deploy and work in different manners.
[00:51:26] Reuven: So in this case, if I go back to the MPX for a moment, you'll see different options. We've got options for training, this handles all the neural network components, so you can optimize, monitor, and do other tasks. I have a verification system that checks whether it's actually building what it's supposed to, whether it's truthful, and so on.
[00:51:49] Reuven: Is it actually, is it truthful, is, you know, is it? And then I also have the ability to integrate people into the process using a collaborative pair-based programming approach, which integrates participants that aren't AI. So lots of components depending on what you're trying to do.
[00:52:09] Reuven: So the answer really depends on the problem you're trying to solve. That's part of the idea of an agent engineering system. You are the active orchestrator engineer of the process, and you can make it your own. I made this for engineers, not necessarily for vibes directly.
[00:52:32] Simon: Yeah, absolutely. And interestingly, when you mentioned vibes there, when we lean more into specs with things like this and you can kind of fire off a number of different agents to run things, we get much longer running processes. Do you feel like I’ve thought about vibe specing as something whereby we are still vibing, but we're actually losing contact with our specs a little more in terms of reading them firsthand? Is that something that you've also felt, whereby you're more able to interact with an LLM on your specs and have everything get generated from there?
[00:53:09] Simon: Or have you found that the new way of development is about this direct interaction that we are having with specifications?
[00:53:18] Reuven: The specs are the most important part of the process. You know, people ask me how much time I’m spending on the spec versus letting it run. Now I'm spending all my time on the spec.
[00:53:31] Reuven: Letting it run is just letting it run, but telling it how to run is a particularly important part of the process. Without a decent PRD or technical specification, you're not going to get decent output.
[00:53:47] Simon: So you are very spec-centric now, where the source of truth is the spec.
[00:53:52] Simon: How much do you touch code after generation from the specification?
[00:53:58] Reuven: How much do I touch? It depends. One thing you'll notice with AI-centric development is there tends to be a cognitive offload. It makes you lazy in some regards.
[00:54:14] Reuven: And I'm not different from anyone else. Certain things, if I want to change a title or do some tech somewhere, it's quicker to just edit it. But even then, sometimes I have a tendency to say, Claude, go do that, which is kind of silly. I could literally open the file and edit the code in five seconds.
[00:54:34] Reuven: And in that scenario, the Claude code is going to take longer, but I'm just lazy. Go do that. That's a challenge. AI has a tendency to make you powerful, but also a little lazy in certain regards.
[00:54:54] Simon: Yeah, super interesting. What do you think are the next steps from an agent engineering point of view? Where do you see the agentic future of software development heading?
[00:55:02] Reuven: I think we're going to get better fidelity. We're in the early stages, and what I'm showing is a leading indicator of what's possible. There are two schools of thought. When people see the stuff that I'm building, they call it AI slop.
[00:55:20] Reuven: It's like, there's no way this could build so much stuff so quickly and have it be of real value. But the fact that I was able to get hundreds of thousands of users within days indicates that it's probably not, though it's hard.
[00:55:39] Reuven: You know, there's so much there, and I understand why people would think that. There's no way for them to review millions of lines of code. So the assumption that it's bad is understandable. But the real litmus test is whether it actually solves the problem it's intended to solve.
[00:56:06] Reuven: Like, oes it do the thing you want it to do? And secondly, does it do it in a way that's effective, secure, and optimized? If you can check off those things, that's what matters. Is it artisan code optimized line by line by someone with 50 years of coding experience?
[00:56:31] Reuven: No, it isn't. It's a different thing. But it is something I could create in a moment to illustrate, or I can create every possibility of a problem and choose the best one.. Where previously I'd have to choose, I'd have to make an assumption and go with that assumption.
[00:56:51] Reuven: Whether it was right or wrong, I couldn't test every assumption and choose the best of the assumptions, 'cause it just wasn't enough time in the day or the cost, you know, it was cost prohibitive. I can do all [00:57:00] that now. I don't even need a team of people. I just need to ask the right swarm the right question.
[00:57:05] Simon: And what would you say is kinda like, 'cause as a final question, what would you say is your kinda like the biggest bugbear right now that you have with, with kind of either, I guess, agentic flows or LLMs perhaps, in agentic flow? What would you say is the biggest problem that, if you click your fingers and make it disappear or have a workaround or a solution, today is causing you the most problems with your, with this kind?
[00:57:31] Simon: You know, nice workflow, whether it's through Claude Flow or agent development today.
[00:57:37] Reuven: Validation, I think, is still a challenge. And I think we've discussed this a few times today. Is it true? Is it real? Does it actually do what it needs to do in the way that I wanted it to do that thing?
[00:57:52] Reuven: And ultimately, that is the key here. Anyone using Claude Flow, pretty much anyone using any of these systems, suffers through that a little bit. Regardless of Claude or whatever else, they all suffer from that same problem.
[00:58:10] Simon: Yeah, no, I think it's a very fair assessment, actually, and I think it's something that we'll hear again and again with validation.
[00:58:15] Simon: I think it's super important for trust and actually just making sure that what is kinda busy building and busy developing is fit for purpose and actually what we want to deliver. Ruv, this has been super useful. I would love for you to kind of mention where people can access the foundation in terms of next steps. If people want to learn more about SPARC or Claude Flow, where do they start?
[00:58:45] Reuven: I created a new landing page, finally. I'm unlike the shoemaker who doesn't have any shoes for a while, but yeah, I did end up creating a landing page. It's done quite well for me, ruv.io. You can see all my various projects and things that I'm building at any given time. I am very prolific on GitHub too, github.com/ruvnet.
[00:59:08] Reuven: And you'll see all my various projects, which run the gamut from useful to just crazy. I'm just trying to push the boundaries of what I can build and what these systems can do just because.
[00:59:23] Simon: Valuable to crazy? You could call that valuable to inspiring, I guess.
[00:59:27] Simon: Let's do that, shall we? Yeah.
[00:59:29] Reuven: Yeah. I know I'm on the right track when people say it's not possible and that I'm crazy. That's perfect. I'm looking at the right things then. Exactly. If you go back like three and a half years ago when we started using the term agentic and agentic engineering, everyone's like, no, that's stupid.
[00:59:49] Reuven: It doesn't mean anything. It sounds like a scientific or medical procedure. That only inspires us to try more and do more. You know, we're on the right track.
[01:00:00] Simon: Amazing. Ruv, it's been an absolute pleasure. Thank you very, very much for the insights, and I hope our listeners enjoyed it as much as I did chatting with you.
[01:00:08] Simon: So, thanks again.
Reuven: Yeah, my pleasure. And thanks for having me today.
Simon: Absolutely. And, for those listening, please join us, well first of all, subscribe, like the video and please join us for the next, tune into the next episode as well. Thank you very much.
Chapters
In this episode
In this episode of AI Native Dev, host Simon Maple chats with Reuven “Ruv” Cohen, founder of the Agentics Foundation, to explore the breakthrough of agentic engineering and its impact on AI development. They discuss the evolution from deterministic scripts to autonomous systems that leverage recursion and feedback loops, highlighting how recent economic shifts have made long-horizon agentic swarms feasible. Discover how this new engineering discipline is reshaping the industry and creating a professional identity centered on verification and repeatable outcomes.
- Agents, recursion, and a sudden flip in the cost curve: in this episode of AI Native Dev, host Simon Maple sits down with agentic engineer Reuven “Ruv” Cohen, founder of the Agentics Foundation and creator of Claude Flow, to unpack why 2025 feels like the year agents broke through. They cover how practical agentic systems differ from old-school automation, the breakthrough of recursive feedback loops, the economics that made long-horizon swarms feasible, and why a new professional identity, agentic engineer, is emerging.
From Deterministic Scripts to Agentic Engineering
Ruv traces agents back to their roots: early software “agents” were deterministic automations that operated within narrow, procedural bounds. They worked when the world behaved exactly as expected, and failed the moment reality deviated. The agentic shift began when large language models, starting around GPT‑3, enabled systems to use natural language alongside tools and infrastructure. That language layer unlocked flexible problem-solving, but the real catalyst wasn’t language alone.
The defining difference, Ruv argues, is agentic coding’s focus on autonomy with feedback. Agentic systems don’t just follow a chain of instructions, they operate in the environment, observe outcomes, and adapt through iterative loops with minimal human oversight. The agent doesn’t merely “do steps”; it uses the results of those steps, successes, errors, and logs, to decide what to do next. This changed the work itself, giving rise to a new practitioner: the agentic engineer.
Agentic engineering is not freeform “vibe coding.” It’s the deliberate design of architectures, processes, and controls that make outcomes repeatable and verifiable. The role centers on structuring context, tool use, and feedback so autonomy becomes reliable enough for production workflows.
Recursion, Feedback Loops, and Long‑Horizon Execution
The breakthrough Ruv emphasizes is recursion: feeding execution artifacts, compiler errors, stack traces, test failures, logs, and partial successes, back into the system. Early prompt-engineering patterns fixated on chain-of-thought, but the power came from closing the loop. Once the agent understands the context of what happened, not just the planned procedure, it can diagnose, repair, and continue. That’s what enables long-horizon behavior, where agents run for hours or even days to complete complex tasks.
Practically, that means instrumenting everything. Developers should capture stdout/stderr, diff outputs, test results, and any external tool responses, then persist that context and re-prompt the model with exactly what happened. Treat the feedback loop as first-class: budget and time constraints, retry policies, error classifiers, and structured error parsing all matter. The agent’s effectiveness is a product of how well you expose it to reality and how precisely you feed that reality back.
Trust isn’t the lever, verification is. Ruv is explicit: practitioners do not inherently trust model outputs. They build verification harnesses and guardrails because LLMs take the shortest path to an answer, which can mean “simulated” success. Engineers counter that with automated testing, environment sandboxes, deterministic tool interfaces, and explicit acceptance criteria. The model learns from hard, unambiguous signals, passing tests, zero runtime errors, validated API responses, fed back through the recursive loop.
The Economics Flip: Unlimited Tokens, Claude Code, and the SPARC Protocol
Agentic swarms, multiple coordinated agents tackling a problem, worked in prototypes but were economically brutal. Ruv’s team proved they could run agents for 36 hours, but even minimal operations cost around $4,000 per day. Scaling to concurrent swarms hit roughly $7,500 per hour for 10 agents. It was often cheaper to hire humans.
Then the cost curve flipped. In April, Anthropic’s Claude Code introduced “all-you-can-eat” style plans, effectively removing the hard token ceiling that had kept long-horizon recursion and swarms in the lab. Ruv and collaborators reimagined their SPARC protocol, which orchestrates recursion in agentic workflows, and suddenly hours-long, high-parallelism runs became feasible for a flat monthly fee, often starting near $20. Capability jumped while cost collapsed, an inflection you rarely see in software.
For developers, that unlocks new patterns: parallelize subtasks aggressively and use cross-verification among agents to lift quality; run longer horizons for complex refactors or multi-service changes; tolerate iterative retries because budget no longer melts on every loop. Still, put budgets and watchdogs in place: cap concurrent agents, enforce timeouts, and log token usage, so scale doesn’t surprise you.
Defining the Agentic Engineer: Beyond Vibe Coding
Alongside the technical shift, a professional identity is forming. The Agentics Foundation emerged from Ruv’s realisation, shared by a growing community, that they were practising a distinct engineering discipline. While “vibe coding” is great for ideation, agentic engineering prioritizes architectures and processes that are repeatable and outcome-driven. It’s about designing for autonomy with verification, not just prompting for inspiration.
The foundation also responds to AI-washing. Big vendors label chatbots as “agents,” diluting the term. The community’s view: if it doesn’t operate autonomously with tool use, feedback loops, and verifiable outcomes, it’s not an agent. The foundation operates as a member-led, meritocratic guild to articulate standards and protect practitioners’ interests as trillion-dollar companies shape the narrative.
Ruv’s own path underscores the moment. Early access to ChatGPT led him to ask a blunt question, how to become influential in AI, and then follow the model’s advice with remarkable consistency: build a focused community, post openly on GitHub, run weekly livecasts, let people buy your time. The outcome: 100k+ subreddit members, 100+ customers, including 20 Fortune 500s, largely “one guy and bots.” It’s a vivid example of agentic leverage in both code and career. Developers can plug into the community via agentics.org and the public Discord.
A Practical Playbook to Build Agentic Systems Today
Start with the loop. Build an orchestrator that: (1) plans a step, (2) executes via tools or code, (3) captures ground truth artifacts (errors, logs, test results), (4) re-prompts with that context, and (5) repeats until acceptance criteria are met or budgets are exhausted. Make error handling explicit, parse stack traces and compiler output into structured fields the model can reason about. Persist a transcript of attempts so each iteration sees the full context.
Design for verification, not trust. Run work in sandboxes with strict permissions. Use deterministic tool APIs (e.g., “run_tests”, “apply_patch”, “get_logs”) and check return codes. Gate progress behind tests or runtime checks. Detect “simulated success” by validating side effects, files really changed, services really deployed, endpoints really responded as expected. Treat passing tests and clean logs as the agent’s north star.
Exploit the new economics. When single-agent throughput stalls, shard the problem and spawn a small swarm, parallel agents exploring different solution paths or handling different files/components, then converge via a final reconciliation pass. Set hard caps: maximum iterations, maximum wall-clock time, and maximum concurrent agents. If you have access to models or IDE integrations with unlimited token plans (e.g., Claude Code), target long-horizon tasks like multi-file refactors or docstring and test generation at repository scale. Keep cost telemetry so you know when to roll back to a single agent or human-in-the-loop.
Key Takeaways
- Recursion is the agentic unlock: feed real errors, logs, and outcomes back into the model so it can self-correct across long horizons.
- Verification beats trust: rely on tests, runtime checks, and deterministic tool APIs. Guard against “simulated success.”
- Cost changed the game: unlimited token plans let you run hours-long loops and small swarms for a flat fee, making parallelization practical.
- Engineer, don’t vibe-code: design architectures and processes with defined outcomes. Agents ≠ chatbots.
- Start small, scale wisely: cap iterations, time, and concurrency; log everything; converge parallel work with a final reconciliation pass.
- Join the guild: the Agentics Foundation is a member-led community setting standards for agentic engineering. Explore agentics.org and the Discord to learn, share patterns, and find collaborators.
Related episodes

AI TO HELP
DEVS 10X?
Why AI Coding Agents Are Here To Stay
17 Jul 2025
with Patrick Debois

TEST TO APP
IN MINUTES
Can AI Really Build Enterprise-Grade Software?
26 Aug 2025
with Maor Shlomo

CUT TECH DEBT
WITH REWRITE
AUTOMATION
What If Fixing Code Wasn’t Your Job Anymore?
22 Jul 2025
with Jonathan Schneider