
Also available on
Transcript
[00:00:00] Simon: let's go into agents a little bit, uh, deeper in terms of the way we use them. Um, you mentioned cursor, you mentioned called code. Very different usage patterns in terms of one uses a terminal ui, one uses a, a more classical IDE, uh, if we look at others, in fact quite a few have almost favored.
[00:00:20] Simon: Either ID or terminal, but there are a surprising number that, uh, favor terminal these days. Is there a best way to use an agent? Uh, is there a way that is, you know, easier for us to give an agent information or context, or is it really. Up to the user, depending on how the user prefers to interact with the agent.
[00:00:40] Max: Yeah, so I think, uh, the interactivity is the differentiator factor. Like when you are using an ID with a extension, uh, like a cosar, uh, being like visible to you, then you are, uh, like agreeing to. To the scenario where you are in the loop, you are always [00:01:00] checking what happens and you are just doing this interactively.
[00:01:03] Max: So UI elements like simplifies, uh, and uh, um, you know, like accelerates the onboarding. Um, for you as a developer, it doesn't replace you as a developer. It just like here on the side exists to help you with achieving like specific tasks like executing, uh, helping you to write a specific code file or specific unit tests and so on and be.
[00:01:25] Max: Very, uh, helpful in a sense that there is a ui. You don't need to know any commands because there are buttons and, uh, it's just like a box for you to type the text and execute. So, um, I think like the cool, uh, part of the cursor or windsurf is that they are giving you this idea, uh, with um. With lots of UI elements to help you, uh, onboard to using these tools more like proactively and in your everyday use case.
[00:01:56] Max: While for AI agents, like with terminal by definition, like you [00:02:00] don't need like, uh, buttons, right? Uh, like complex things. Complex things can be done just with a combination of the, of the, of the, of the keys, uh, or the commands. So terminal is like for more advanced users, uh, for those who are ready to accelerate even further because they don't need to.
[00:02:19] Max: Have, have visualization. They are okay just like living in the terminal and manipulating all the commands in the terminal because if you're at some point advanced the Veeam or Nana or whatever editor, uh, or a gi, you actually don't need UIs, right? You can just do and the achieve their hard scenarios.
[00:02:38] Max: With, uh, terminal directly, and it saves you time. But again, like it's just like a bit harder because it's less interactive, because it's easier to, to lose what happens. Um, and I think the, the amount of the generated information sometimes may be overwhelming, so it's hard to navigate for the terminal just because there's so many things are happening and it requires some [00:03:00] skill to learn how to use it properly.
[00:03:01] Simon: Yeah, absolutely. And I think things like. Cursor, for example, where you see the, the, the changes and you can flick from file to file very, very easily. You obviously miss that from the, from the terminal point of view. But I dunno what it is about, you know, just being a terminal, that actually just makes me feel faster, more productive at the terminal.
[00:03:19] Simon: Yeah. Um, and of course when you're at the terminal, you get the power of the terminal, right? Yeah. It's, it's it. Tell us a little bit about how you can effectively use the terminal beyond just chloral code.
[00:03:30] Max: Yeah, that's true. And another super advantage of the terminal solution is that like, uh, it gives you a batteries to run your agent, uh, in the background, right?
[00:03:41] Max: Like you can run the call code. Gini, CLI in the non-interactive mode, um, just for ACL i, given a prompt what you want to, you know, what you want from this agent to, to do and it'll spa the background process and fully on its own start executing, uh, this complex request, [00:04:00] which we'll ask it. But I think nowadays actually, like any AI system tools like such as, uh, coor, they're also moving in this direction and they're adding more and more capabilities.
[00:04:09] Max: For example, uh, a couple of months ago, coor couldn't. Have access to a terminal. So you, if you run something, you need to, you needed to copy and paste directly in the, uh, in the text box, uh, of the, of the course or the error message, for example. But now it actually has access to a terminal, so it has access to the information which you are in the terminal.
[00:04:31] Max: Like, uh, using, so that's one thing. And another is that course are also launch their own agent, which can be also non-interactive or CLI. So they are covering this space as well. Uh, you know, they want to get both sides and both people like who are, who want to interact, who want to have more interactivity, and who, who, who are okay to, uh, work just with CLI.
[00:04:54] Simon: It's funny how I think we're fairly familiar, you and I with. Trying out a bunch of these. And I think everything that we've [00:05:00] mentioned so far, I, I've definitely played with, I'm sure you have, and there's obviously a bunch of others. Gemini, CLI, I think you actually maybe mentioned Gemini Codex is another.
[00:05:08] Simon: Where would you say, like, for, for those listening who are kind of like on the, you know, trying to, trying to follow this 1 0 1 working out, well, where do I actually start? Where would you say it's like a good. Place to start? Is there, is there one good place or is it really depending on where, where a user is most consistently, you know, doing those tasks, it better they try and find an agent in that environment.
[00:05:30] Simon: What, what would you recommend?
[00:05:32] Max: I think it depends on the level of experience and maybe on the role of the person in the company, if we're talking about some, some technical people, because definitely there are no code solutions, right? Uh, like lovable for example. That general, that provides you a, uh, vibe coding platform for developing backends and front ends, uh, instantly without any knowledge of the code.
[00:05:53] Max: So it requires you zero knowledge. It just requires you to be there to validate and approve the, the. The content [00:06:00] generated by the ai, um, assistant or whatever. Uh, and the more the, the closer you are to the code, uh, and the more scale you need, uh, then the more advanced tools you need to use, um, which require potentially a lot of.
[00:06:17] Max: Manual resolution, uh, of the problems. So for example, if you are like a software engineer and you are writing and your, your goal is to develop a lot of software tasks and, uh, a lot of manipulations with the code, then you are okay to start with, first of all, a co-pilot just to see if you are, if the co-pilot models.
[00:06:36] Max: Are helping you to at least accelerate your productivity, uh, as a, as a developer. So does it help you to write a better faster code? If you're happy with the copilots, then you're going to the next extension and go with a core store id. Uh, when you are starting using these complex assistance, uh, as your companions helping you achieve the task and now the scope.[00:07:00]
[00:07:00] Max: Uh, of autonomy and what assistance can do grows. So you can delegate it to change not just the function but entire files. And once you are okay with that, then you can can move to the top of this hierarchy and start using cloud code and Coor or Gini, CLI, um, directly in the terminal because now you know the strong sides of this.
[00:07:22] Max: Now you know what, you can delegate it and probably already like familiar with, uh, core capabilities and familiar with how this. Uh, tools are working so you can delegate even more scope, uh, in a fully autonomous pipeline or setup. So I think that the, the, the reasoning should be like that. Um, start with something super simple, which doesn't require any knowledge of code.
[00:07:44] Max: And if you already have a knowledge, then progress further, uh, and go into more and more advanced, uh, tools.
[00:07:52] Simon: I think it's a really, really good advice in terms of, it's not just actually about, well, there's two things I think here. One, one is that there's a trust. [00:08:00] Implication and it's important to be in control.
[00:08:03] Simon: Like when we start learning how to drive, we don't jump into a Ferrari and try and try and try and, uh, drive that Ferrari. It's about, do you know, taking small steps. Correct. Once we are comfortable with it, once we trust ourselves as well as the tools that we have, we go onto the next step. And, and this growth in autonomy is, is all about that growth and trust.
[00:08:21] Simon: Yeah. I I absolutely love that. And I, and I like the, the focus there as well in terms of. You know, sticking to where you are most familiar in that sense of, if you're, if you're familiar with an IDE, well, maybe let's, let's start, you know, when you go from there, maybe GitHub co-pilot or something like that in the ID is the best place.
[00:08:39] Simon: Or, and then maybe cursor, if you wanna stay with ID if you love terminals, maybe dip into something else from, from, from that. Really, really like that advice. Some, some of our listeners may have heard of subagents before. What, what's the difference between an agent and a subagent?
[00:08:53] Max: Some like subagents, it's basically an instance.
[00:08:57] Max: Like another instance of an agent just [00:09:00] found and controlled by the main agent, and you usually use that when your task is so complex that one agent is not enough to tackle it. And when there are different. Options or different paths that can be explored. Uh, so I think like, yeah, two, two scenarios where the, the problem is so complex that it needs to be split into smaller parts or when you need to make a search to like just explore the space and, uh.
[00:09:29] Max: Choose one, one of the options. So yeah, these two scenarios, but basically, subagent is just like another instance of, uh, of a main agent, which is asked to solve a specific problem. So, uh, you can think of it as a master slave architecture, uh, right, like if you are, for example, interacting with a master agent, with with, with the one which is you have access in your terminal.
[00:09:53] Max: Uh, and you ask it to develop a, um, a service that has a front end and a backend, then it's a [00:10:00] great candidate for splitting, uh, and spouting a subagent. One subagent will be working on the backend. Another agent will be working on the front end, um, and they will be doing their, like pieces independently, and then at some point they will go back to the master, uh, agent to communicate back the result.
[00:10:19] Max: The MA master will decide, are they okay, or like, is the result is fine? Does it satisfy the overall goal? And if not, then it will circle back the, the information to them and give a feedback and they like continue iterating or refining, uh, based on the feedback which they receive to improve this backend and front end.
[00:10:36] Max: That's a, uh, example of how to use subagent for coding assistance for. Tasks beyond coding where you need like exploration. Uh, subagents might be just like a parallel instances of the same agent, which are just exploring the problem from different angles. For example, you want to search, uh, for some information, like to answer some specific [00:11:00] question, and then you can.
[00:11:01] Max: Make a search with one type of query in the Google. You can make a search over the articles in the Wikipedia. You can do a search over the information available in your, I dunno, like local database. And for all that scenarios, they're all about the search, uh, of the information in different sources. But they're done by just like a different agents, technically speaking.
[00:11:22] Max: Uh, in, in some future there won't be needs, uh, in subagents because the master agent will be so smart and will be knowing how to, um, how to resolve any challenging problem, uh, on its own. So it won't, it won't, you know, loose and won't forget about the context. It will know how to navigate in this complex structure on its own, and technically it won't need to have spawning subagents.
[00:11:48] Max: Um, but another reason for another motivation for the subagent is just like speed. All we, we know that agents are super slow. If you try to paralyze them and sub agents is one way of doing this, you just get the result [00:12:00] faster. You just get like. 10 implementations of the same thing, and maybe one or two of them are good and working and doing what you are expecting.
[00:12:07] Max: So it's just like, um, like, you know, like latency, um, and throughput. Uh, versus idling time. Um, yeah,
[00:12:16] Simon: I think like the time that you mentioned there and the parallelization as well as the context of both really good reasons to use them. Um, in terms of, in terms of the cons, the, the, the, the, the negatives that, that they can, that they provide.
[00:12:31] Simon: Is it just, is, is it like burning tokens more or anything? Anything that we need to be. Cost yourself.
[00:12:37] Max: Yeah, I mean like, for sure like costs will grow, uh, linearly, right? Like with more agents you will cost, like spend more. Um, and sometimes there might be a situation where you like, again, like spawn the, these old agents, but they actually didn't converge to anything.
[00:12:52] Max: Um, and another disadvantage is that there are different ways how to spa, how to launch The subagent. [00:13:00] One is that your main agent can launch. Subagent on its own because like cloud code actually now gives this capability. You can ask explicitly saying explicitly in the prompt, use parallel agents to analyze this complex document.
[00:13:15] Max: Right? So you can explicitly state it in the prompt and it will on its own span, like multiple versions of the agents. But the thing is, is that you can. You might want to have specialised sub subagents, right? So instead of like general search subagents, you want to have subagents focused on a specific.
[00:13:34] Max: Access or a specific angle of a problem. And here you need, like to have extra customization, you need to work, you need to, uh, carefully design instructions for, uh, for that subagent in order to get the best, you know, get, get great result. So the downside that, you know, like speaking of downsides, the time for designing these subagent, it's also like, uh, an art and you need [00:14:00] to make an effort if you don't.
[00:14:02] Max: Do it, then probably, um, main agent will do it on its own, but the quality of the subagent might be different because it might just like miss uh, some. Some, some important bits, uh, in, in, in this destruction to the subagent. Yeah.
[00:14:17] Simon: And, and in terms of the agents, like other, other than being able to customise an agent and be able to, uh, a subagent rather and, and be able to say, Hey, I want you to use this subagent when you're, when you're doing these types of things, uh, or allow it to choose the, the, the appropriate kind of subagent to do those tasks.
[00:14:33] Simon: I presume the main agent is the, is pretty much entirely. The, the, the communicator across subagents, there's no human interaction between, uh, that, that, that bypasses the main agent.
[00:14:46] Max: Yeah, I mean, like, it depends on the design of your system, and I think in close code it's very hard to control it. Like usually the subagents are launched on the background, so you don't actually have access directly to see what they're doing.
[00:14:57] Max: You only can see it in the logs. So if you [00:15:00] open the, um, cloud. JOL filed to see what were the commands. You will see that agents was doing something on the background, but you won't necessarily see it in your terminal. Uh, you won't necessarily see the, the thinking process, the, the path, the planning for each of the subagent.
[00:15:17] Max: So yeah, like. Again, and maybe another like, I dunno, disadvantage or even advantage of this is that like you trust it even more. So you need to believe in, in the fact that it'll come up with something, uh, meaningful even more if you are entering the world of subagent.
[00:15:33] guy: If you're, uh, a developer. I think a lot of vibe coding, whether it is in platforms like Base 44 or if it's things that you do more locally, the cloud code sort of, uh, cursors of the world.
[00:15:44] guy: Um, you in, in those worlds, you have a very tight relationship to the code. The, the LLM modifies, the agents, modify the code, but when there's a problem, you are sort of, supervision happens. On the code. I think in base 44 you can see the code, [00:16:00] but it is, it is a, a kind of a hidden entity. Yeah. Or it is not, it's not designed to put the code in your face too much.
[00:16:08] guy: How do you, how have you seen and what maybe there's like learnings, learnings about how you have built for people who don't understand codes to be able. To, to interact and get the system over the hump when it fails to deliver something. Right. When it fails to accomplish something, is it all just like, give it more guidance, give it more guidance and, and hope it gets it right?
[00:16:29] moar: Yeah. Um, not only, I think the interaction when you are building an app in Base 44 versus, for example, if you're building an app using cloud code or er, it is a different interaction and, and many times it's a different user. And a lot of times currently it's a different piece of software. Mm-hmm.
[00:16:50] moar: So
[00:16:50] moar: I'll start with obviously the way that, um, base 40 four's users do iterations and experimentations and so on, [00:17:00] has a lot to do with visual.
[00:17:02] moar: Like you see the app in front of your face, you know, if something is wrong, you'll see that in the interaction or you'll see a bug being thrown away or something like that before you even look at the code. Um, I think platforms like Base 44 still, there's obviously like a limited. Set of software that they can build.
[00:17:21] moar: Um, and this, this set is gonna expand further and further and further as time goes. And that's what we're working on. But it's still important to be very honest and say there's a set of software that you'll be better off building using Kel than, than, than using base 44. Um, so for example. Base 44 itself, put aside the font end, which is obviously built using Base 44, but the backend for base 44 is not built using Base 44.
[00:17:48] moar: Right? It's like built using and code, code and so on. Um, because you need to look at the code and you need to understand the code. And so yeah, this whole thing, like this is the situation and reality right now. I [00:18:00] think this, this will aggressively change in the next few months even. Definitely few years, again, as, as you'll be able to visually or, or like forms only, uh, without looking at the code, you'll be able to build an increasing percentage of the software that you're currently building,
[00:18:21] moar: right?
[00:18:22] moar: Um. So the tools that we're giving our users, so a different set of tools to how, how they experiment and iterate with those software. It has to do a lot with visual. You have to see the, the, the application all the time. It has to do a lot with. Trying to explain better what's going on. So for example, we have this, uh, small feature, but people really love it.
[00:18:47] moar: Well, when you prompt something in base 44 and it starts coding, so obviously we, like most of our users don't wanna see the code. It just, it's terrifying for them. So they don't understand what's going on. And so we have another small LLM that [00:19:00] basically. Watches the coding agent, see what it does and kinda like, explains to you in real time, like, what's the coding agent currently doing?
[00:19:09] moar: Mm-hmm. And like a small, like now it's, uh, changing the uh, uh, UI from light team to Dow theme and whatever. And so yeah, we're building a set of tools to help people build software without the need to look at code.
[00:19:23] guy: Yeah. That's so smart. Progress indication, it's a general and uh, and engagements are super important.
[00:19:28] guy: So I think that makes sense to me. If I echo this back, I feel you're saying, um, that there's sort of the two domains, maybe that base 44 today is most helpful are cases where a lot of the validation and a lot of the decisions are much more product oriented and a lot more visual, specifically oriented.
[00:19:45] guy: Uh, and, uh, as opposed to sort of technical and how did you implement? And the second is that maybe a lot of them are, are places where that visual is also the innovation. Like a lot of the technology, you're not looking to, uh, make a product that sort of works. That's not some sort of the world's. [00:20:00] Fanciest progressive web app that, uh, does a lot of things.
[00:20:03] guy: A lot of your, your creation is a lot more about the content, uh, and the navigation and all that might be sort of similar and common. And so the LLS know how to do it well, and you can do it very efficiently, but you're innovating on sort of the, the content and the looks, uh, more. Uh, does that, does that sound fair?
[00:20:21] guy: And then it will expand over time.
[00:20:23] moar: Definitely. But I'll add on top of that. I think it has. More to do with the type of software that you are building, right? Mm-hmm. So if you are building nowadays something that, that's like performance is super important, right? And like you can't vibe code your way. The it like, it's like if you're building, I don't know, A VPN software, let's take this for example.
[00:20:46] moar: Um, yeah. So obviously like it's, it's gonna be really hard or something that relies very heavily on like backend and stuff like that. So yeah, the previous company probably can't Vibe code [00:21:00] sneak. Yeah. Uh, at least not yet. Uh, I think for other set of applications, it's not necessarily the visuals, but more of like if you are building an application that that can do what it needs to, to do with.
[00:21:14] moar: A database set of integrations, APIs, and so on. Um, I think like you, you'll be like, you can do a really good job using Base 44 and it'll likely be a, an easier experience than try to code it via Kiso.
[00:21:30] moar: Right? Yeah.
[00:21:30] moar: Uh, so it has to do with like. The low levelness of stuff and complexity on the backend side.
[00:21:38] moar: Uh, but we are getting better at that.
[00:21:40] guy: So, uh, I have one more question on base 44, but maybe let's first switch to how you built it. We'll talk about your stack and then we'll combine a little bit. I'd like to, uh, touch a bit on longevity, like building a disposable app. I think you're creative doing it versus what do you build for long term.
[00:21:57] guy: But take us behind the scenes a sec. So you built, you know, a [00:22:00] single person, build some substantial. Platform and also scaled it. Uh, I guess tell, tell us a bit of the, uh, of the tips and tricks. I heard you shared a few of, you know, what, what helped you, uh, create that? Maybe started with like, do you think you could have done this?
[00:22:15] guy: I know you used ai, your creation, like, was that critical? Do you feel you could have pulled off a base 44 without AI dev help?
[00:22:24] moar: Yeah.
[00:22:24] guy: Uh, as a builder,
[00:22:26] moar: uh, short and so is definitely not. I was least not in the same structure and way that I did it with Base 44. So when I started Base 44, even before. Knowing what I was gonna do, I knew that we're in a different era.
[00:22:43] moar: I've been like leveraging ai, working with AI in my previous company, both as like AI inside the product and also coding with ai. Mm-hmm. Um, and I felt like I felt strongly that we're in a new era where it doesn't matter the number of people building software anymore [00:23:00] or matter about, like if you have a unique take.
[00:23:03] moar: Unique product, kinda like, uh, and you can design well, like design software. Well not, mm-hmm. I'm not speaking about like UI design then It's not anymore about like, Hey, throw as many people at the problem as you can. Those people will write more code faster and that's why you get to, so I was like, I'm gonna do this alone.
[00:23:24] moar: From day one. I'm gonna set up the repository. And spend 20% of my time content constantly on how can I further, uh, increase the velocity and speed. So a lot of what I did is like experimentation with how will the lamp write, uh, code better. Obviously I did it for base 41st product as well, right? But. But even for my own repository, right?
[00:23:47] moar: So I find those tips and tricks as like what I've spoken about, like JavaScript versus TypeScript. Maybe now it's changing with cloud phone and so on, but back then it's like I could write, uh, [00:24:00] front end, like the speed of light, uh, just because of the setup.
[00:24:03] moar: Mm-hmm.
[00:24:04] moar: Uh, another thing is, is I've built this from the backend side.
[00:24:09] moar: I've built this very high level, uh, infrastructure. Uh, that handles like the database and user management and so on, so that even if I would've pivoted to a completely different idea, 80% of the code would stay the same. And so I had this like, really good infrastructure. Um, that stayed the same. And tho that's what those were, like the parts of code, that it was slightly more me than the AI actually writing and making sure that it's perfect.
[00:24:40] moar: Informed that once you have those infrastructure, and so now adding a new feature that I don't know has like a new table in a database and has some intera, some interaction with, uh, with other parts of the platform became super easy. Uh, the way I would measure that is for the LLM to implement a new feature.
[00:24:58] moar: I wanted to write [00:25:00] at, uh, at the least code possible to implement the feature end to end. And so I built a lot of like those things obviously, and, and the rules and so on into the repository so that I could run on my own but still keep the same velocities, my competitors, if not even better, faster. And I think, I think to some degree I've done that.
[00:25:22] moar: Like I think that the competitors in this category are super heavily funded. Um, and in many ways and aspects base 44 executed, uh, extremely fast, if not faster than, than a lot of the players in the, in the category. Right. And I give a lot of credit to the ai, like most of the credit is to today. AI of like sing it, getting,
[00:25:46] guy: uh,
[00:25:47] moar: the, yeah, I wanna think it,
[00:25:49] guy: it's like a nocar.
[00:25:50] guy: I get the. I think, uh, so, so, so I love that. I guess, again, kind of echoing back a little bit, I think, you know, the first thing you're sort of touting is like the value of a good architecture, uh, where you [00:26:00] anticipate, uh, what would be future changes. And you write the ability to write compartmentalised code.
[00:26:06] guy: And I guess I'm, I'm hearing something that is almost similar, like. To hiring either junior developers or outsourcing, you know, places in which if you build the right architecture, then you could have clear instructions to someone else who's maybe a little bit less trustworthy, uh, to write pieces of software within it, uh, and be able to assess whether they've done their job well and be, uh, I guess can be able to, to guide them.
[00:26:33] guy: Does that sound like a, a fair analogy?
[00:26:35] moar: Yeah. Yeah. Exactly. 'cause also one, one other thing that like I've noticed is like when you are leveraging AI for coding, so think about it this way, like you, let's say you are building like a simple. Full stack web app, right? So, uh, you tell, hey, I want it to be, um, a to-do list up or whatever.
[00:26:55] moar: So the AI implements like some of the routes and connects the frontend and so [00:27:00] on, and then you open up a new chat, right? Inker or something, and you wanna implement another features like write an admin panel, so on. Uh, so it'll look at your code and it'll do the task, but, and AI is really good at that, but AI is not.
[00:27:13] moar: Really good at stop and saying, oh, you know what, there's actually like common concepts here that I could like, uh, combine to like this infrastructure or package that I can later use in other features. And, oh, I'm writing the same code again and again. Like, let's stop. And before I do, like the, the task that you gave me, let's kinda like implement some refactoring to Yeah, let's refactor the code so that I won't need to write as much code and so on.
[00:27:39] moar: So I think this is something that a lot of developers currently don't notice about ai. Like it will, it'll implement the task as you told it to, right. But it's still lacking the, Hey, let's stop for a second. We need to refactor the code because I find this common layer or whatever. Uh, and I think that's, that's,
[00:27:56] guy: and so those, that's nice.
[00:27:57] guy: And that's something you still did manually. Like you would [00:28:00] observe what the AI has built and you would kind of go off and, and modify a lot of, a lot of what we see with ai, which I think touches on the topic you said right now is. It's about the initial creation. Right. And it would go off. Yeah. And even there you sort of see this sort of reward seeking, right?
[00:28:15] guy: It might burn down the house to sort of be able to accomplish this thing that you've just like break all the other features, you know, just to be able to make Yeah, this feature you just asked it to, to make it work. But, um, but, but the other aspect is indeed maintainability. It's indeed code duplication.
[00:28:29] guy: It's all these things that are sort of bad practices. Uh, when it comes to sort of quality software creation. Um, so I guess I'm asking, uh, like I, I think you've, you, you did it on your own in base 44 for Euro code base. How do you think about that in terms of building applications that are built by base 44, um, to, to have longevity?
[00:28:53] moar: So in two ways. One, when you build an up in base 44 these days, like. There's [00:29:00] a lot of infrastructure and setup that happens before the LMS even writes code. Mm-hmm. That in some ways limited, but not really. Not in terms of like what it needs to, but it's kinda like, it, it, those are guard rails that the LLM should keep.
[00:29:14] moar: Mm-hmm. And as long as there, it keeps that, then, then it should be fine to some degree. Um, so again, we automatically create the entire, uh. Uh, read, update, delete, uh, kinda like the crud. Yeah. ISDK. And, and we make it like really good and we implement like rate limits on top of that and things like that, that Lums wouldn't really care about.
[00:29:39] moar: And when you leverage, um, I don't know, APIs and integrations and so on, like we keep a very strict schema. There's like a lot of things that we implemented built into the application. So did LLM. Won't, can like be this reward seeking of like, I'll do whatever, but I'll, I'll keep your app exposed and, and so on.[00:30:00]
[00:30:01] moar: The other, the other thing is yes, at some point, uh, we automatically feed back the LLM, whether the user knows about it or not, that it needs to refactor or something. So for example, uh, one of the things that makes LLM confused the most. And people feel that it, like, cause the, the, the vibe coding category is, uh, when they're saying like, Hey, I asked it to do something.
[00:30:27] moar: It didn't do it, it did the opposite. It deleted a feature or whatever. Um, is when code files are are getting too long. Uh, right, right. If, if like you've implemented, so you ask it initially or early on, Hey, implement this page, create a to do list up, and it did something simple. It's like a very simple to do list up, like in any other react, right?
[00:30:49] moar: And then start layering in features, right? So Dell M does. What, what he asked it to. And, and it start adding like, uh, AI features to write the tasks, the description, [00:31:00] and then uh, user management to like assign different teams and both and whatever. And at some point you have like, if you are not careful, you might have like this very large code file, right?
[00:31:10] moar: So one of the things is like we are running behind the. Refactoring tests, uh, to maybe tell their LM Hey, like, you passed the threshold, you should now refactor this file. Even if like, okay, the user ask you something first, refactor the file, then implement what the user has asked you to. And I think like there's a lot more to do there.
[00:31:30] moar: But yeah, it's somewhat of like this, uh. Code quality agent that that's keep nudging their LM at some points. If it finds that it's best affection threshold of like, Hey, like there's this code here and this code here, like do something refactor that. Right?
[00:31:46] Simon: What was the thing do you feel that really kinda like allowed us to turn agents on in our code generation?
[00:31:52] Simon: Is it, is it context, is it trust from, from humans in the process? Is it the capabilities of LLMs?
[00:31:59] Reuven: I don't think it's [00:32:00] trust. I think anyone that's building these systems inherently doesn't trust the output and needs to spend quite a bit of time verifying that what it's actually given us is true and valid and functional and, and not just the kind of.
[00:32:11] Reuven: Mock or simulated version of what we asked it. So there's a tendency for the system to, to, you know, kind of take the shortest path to give you the, the, uh, the outcome. So a lot of what an engineer is doing is, is, is working with solid structures, architectures and, and other things to, to sort of validate what's being built is, is, is functionally true and correct.
[00:32:33] Reuven: Now, to your question of what makes 2025, the year of, of sort of age agent x, if, if that's, if that's even a thing is one of the big breakthroughs last year was again, this idea of recursion. And so, um, we, a group of us, we created these re kind of systems that could operate on a long horizon. These are agents that could run for, for hours or days at a, at a time, and, and then eventually complete whatever problem they, they were given.
[00:32:59] Reuven: Now the drawback to that, we could run it for third, I think I did a 36 hour test, mostly just to show that I could run an agent. You know, for 36 hours. But the costs were prohibitively expensive. And what we saw as we scaled these agents is a lot of the, to, to effectively run this, you're looking at even on a minimal basis, around $4,000.
[00:33:23] Reuven: A day. And when we really cranked it up to, you know, dozens of agents running concurrently, we were looking at $7,500 US an hour to run, uh, at, at this, uh, uh, a 10 agent Swarm concurrently, basically. And the, it is just, it's a substantially cheaper to hire a person at that per at that point to do that work.
[00:33:43] Reuven: It just, even though you could, you could do a ton of output, most applications didn't require that level of, of scale or capacity. So in April of this year, what we saw was, uh, philanthropic and Claude Code came out and they started offering a sort of [00:34:00] unlimited all you could eat buffet for, for, for tokens and capability.
[00:34:04] Reuven: And the first thing we did is we re sort of imagined the previous iterations of the, what we call the spark protocol. Which allowed for that recursion and suddenly we were able to, to spawn these swarms that could run for hours on end and cost a flat fee, often 20, starting at $20, you know, for an entire month, which is, it went from thousands of dollars to $20, you know, thousands of dollars an hour to $20 a month.
[00:34:31] Reuven: So we saw this interesting inflection of both capability. So suddenly we, we were able to build things dramatically faster and more with a lot more capability at an exponentially lower cost. Which is an interesting sort of, you don't generally see that happen all at once, and that was basically May. Yeah.
[00:34:50] Reuven: And
[00:34:50] Simon: that's super interesting because I think is that both the, making it more affordable, but also the parallelization, it really opens up, uh, the, the quality of what we can actually [00:35:00] get back as well as, uh, as a response. I wanna talk about that in just a second before we move on to, to swarms and, and, and spark specifically.
[00:35:08] Simon: Um, I'd love to talk a little bit about the foundation that you created, the Egen X Foundation. Um, first of all, the Egen X Foundation isn't kind of like super new. I guess te tell us a little bit about what, um, you know, what the need was for that foundation. What's, what's the problem it's trying to solve?
[00:35:26] Reuven: It's an, it's an organic sort of evolution of, of the work. And I'm gonna tell you a little backstory, which is gonna probably make me sound absolutely crazy, but I'm gonna say it anyway. So I was early in into chat, GBTI was lucky enough to be an early beta tester. Uh, I think they called it a VIP program back in, in 2022 with open ai.
[00:35:44] Reuven: And, you know, so I basically, that means I got it maybe three or four weeks before everybody else. And one of the first things I asked it is, is how can I be, and this is gonna sound really egotistical, but how can I be the most influential person in ai is what I asked Jack gbt V one [00:36:00] pre-launch. And it, and it gave me this.
[00:36:02] Reuven: This sort of step-by-step process that said, you know, first of all, what are you good at? What do you like and what, you know, what, where, where do you wanna be in a few years? And it, it suggested that anything worth doing takes years of sort of prep and, and, you know, to get to that point. And, and it gave, it set, it gave me this outline based on a kind of q and a that you'd have with chat GBT.
[00:36:23] Reuven: And most of you have ever used chat, GBT and know, know the kind of, you know, asking open-ended question sort of routine. And it said basically you're gonna have to build a social media following and you should be narrow in that focus. And, and, and some, um, I'm asking, well, what does that actually mean? It says, well, what do you like to do?
[00:36:39] Reuven: And I'm, I like to build autonomous things. I, I'm, I've been building cloud infrastructure for nearly 25 years, that sort of thing. And it, it set me down this path and, and said, first of all, you need to create a a, a subreddit. Now I know nothing of Reddit or subreddits, but I did it. I followed the guidance and, and, you know, three and a half years later, I've got a hundred thousand plus, [00:37:00] uh, subscribers of the subreddit.
[00:37:02] Reuven: I'm not really sure exactly why or what I'm gonna do with this. Subreddits it, but it's popular. And then it said I needed to do, and, and the key was consistency. It said, it, it told me that I needed to build a following by constantly posting interesting things that I'm doing and sharing those things publicly on my GitHub, creating weekly, um, you know, livecasts where I can interact with people.
[00:37:26] Reuven: I followed the guidance and next thing you know, I, I, I have, you know, thousands and thousands of people, you know, showing up for these, these live casts following me and, and essentially buying my time on a kind of. Only, only fans for geeks sort of approach where people can sort of essentially buy my, my time on an hourly basis.
[00:37:43] Reuven: And, and, and the craziest part of the whole story is it, it actually worked. And I now, I, I'm, I'm, I'm in a fortunate position where I, I can claim, you know, more than a hundred customers, 20 fortune, fortune 500 clients. And I'm literally one guy, one [00:38:00] company, um, you know, a company of one and my, my wife actually helps with, with some of the non-technical parts.
[00:38:05] Reuven: But generally speaking, it's me and my bots. And the entire sort of business plan and process was literally set forth by chat GPTB one. Which is in itself just amazing to me. Did the LLM tell you to create the foundation or was that something that you've Well, the foundation was an evolution of those early conversations.
[00:38:23] Reuven: No, it didn't. It didn't exactly say that. And what happened was, as the, as the community coalesced around the, the concepts of Agen X and the sort of practical approaches that we were taking to implement the ENT systems, it became pretty clear that we were in the midst of a new profession. And, and when you look historically at different professions, and, and, and maybe, and, and let me, let me be honest.
[00:38:47] Reuven: AI gives you sort of, you know, delusions of brander often, but the, in this particular case, there was a group of us and we we're realising that we are all practicing a sort of similar approach to these sort of engineering of [00:39:00] ai. And, and, and when you, when we looked at it, we saw a spectrum. We saw on one end we had these vibe coders.
[00:39:05] Reuven: These vibe coders are not, I'm not saying that vibe coding isn't important, it's a great ideation learning and discovery sort of mechanism, but it's very freeform. It's very, it, it's very fluid in, in what you're doing without really much of a plan. And what we were doing was different. What we were doing is creating architectures and process and, and approaches that were repeatable with a defined outcome.
[00:39:27] Reuven: That, that was an engineering activity. And what we saw was the, the, the, the term Agent X and agent engineering was starting to be co-opted by large companies who are essentially. Kind of AI washing the concept. They were, they were saying it was an agent, but it was a chat bot, or it was a vibe coding system.
[00:39:45] Reuven: And we saw this as an opportunity for us to say, well, this is our approach. This is what we believe. An agentic engineer as a professional or as a profession should encompass, both technically as well as the sort of ideas of almost [00:40:00] like a, like a guild before our group of, of eng engineering, right? So we're the stone masons for this new emerging, uh, field of, of, you know, agentic engineering for ai.
[00:40:11] Reuven: That, and, and the group formed around that. It's open, it's, it's the sort of antithesis. Some of the corporate led, um, open groups, open AI, sort of being the other end of the spectrum, you know, dominated by corporate interests. We said, we need to make this for people to sort of empower them both professionally and, and maybe there's a more aspirational, but as a, as the society around AI takes shape, what does that actually look like?
[00:40:36] Reuven: And how do we protect the interests of the people ultimately that are gonna be affected by it most? So we act, we want anyway, to act as a kind of hedge against. Trillion dollar companies, um, if we can do that. And that was formed in March of this year. And next thing you know, we are in something like 60 cities.
[00:40:53] Reuven: We've got event, multiple events, sort of organic startups or, you know, uh, meetups all around the, uh, all around the [00:41:00] world happening every week. And,
[00:41:01] Simon: and people can just join that. How do people, what, what's the best way to people, for people to kinda like, get involved?
[00:41:05] Reuven: Well, we've got a, a lot of the communica, the, the communication and community happens around either our Discord channel, discord.genix.org, or WhatsApp.
[00:41:14] Reuven: Our WhatsApp's maxed out, so I'm not gonna give you the address. 'cause you know, when you hit the certain threshold, you basically can't add anybody. Um, but, uh, most of it, we've shifted to Discord at this point. And, uh, gentex.org, we, it's a, again, a member led organization. Meritocracy, if you will. So the, if you, if you wanna get involved, you wanna create a, a new chapter in a different city, do a meetup, basically, you just volunteer.
[00:41:38] Reuven: We and we make it happen. So if we were to
[00:41:41] Simon: look at the state of agen today, what would you say are kind of the things that are creating that ceiling for, for, for egen today? What are, what are the biggest things that causing Egen uh, development to, to fail or at least to, to, to cause problems to developers trying to use Egen [00:42:00] development
[00:42:00] Reuven: today?
[00:42:00] Reuven: The limiting factors of the space right now, and there was an interesting report from MITA couple weeks ago, I think it was, that basically said 95% of ag agentic projects fail. And it was a, a, a fair, a a fairly sort of sensational title. And, and when you, when you dig into that a little bit, you there, there's two ways to, to, to think about those types of sort of stats.
[00:42:25] Reuven: One. You know, 95% of projects fail, which is probably true, but the reason between but for that failure is likely 99% of of engineers, programmers, developers, project managers, don't know how to actually build agent systems, and it's a byproduct of a new emerging space. And it, and it's, so the, the limiting factor is the fact that it's really hard to, to look beyond this sort of, um, a, you know, I'm gonna call it AI washing, the agentic washing of products [00:43:00] and the, and the people associated with it to determine, you know, what the capabilities of the products and the people implementing these products really are.
[00:43:08] Reuven: So when you see this high sort of high failure rate, it speaks to the fact that we're in a new emerging space. And when you go, if you're old enough to have been around at the beginning of the internet, you would've seen the same thing with a lot of the internet projects that corporations took on back in the late nineties.
[00:43:23] Reuven: And there was this sort of, you know, the, this idea that the internet wasn't quite gonna cut it for most business type applications and then they were wrong. It was the fact that we didn't, as a industry. Understand exactly how to build user friendly internet based applications. And over time there were models to follow.
[00:43:41] Reuven: Amazons and the eBays and whatnot showed up and we're like, okay, this is how you create a web-based service that people can easily interact with. And it's not just recreating software. Same problem 25 years later, 30 years later in the GenX and AI space. There's a tendency for people to build applications the [00:44:00] way they've always built them with, with human-centric models, uh, review cycles, long drawn out sprints, um, and different sort of, you know, traditional tactics that were optimised for a world where we built slowly over time and gradually.
[00:44:16] Reuven: Now we're in a world where we can literally copy anything anywhere at, at a moment's notice. And the ability, the, the quality of the code, although important is less important than the momentum that you get in terms of speed and time to market. So when you're looking at companies that are embracing this, they're, they're embracing this in a way that doesn't just replace their developers.
[00:44:39] Reuven: That, that, that might be a byproduct, but it augments them in ways that those developers were never able to do before. So you're empowering them with sort of a kind of superpower to, to, to create. Much more effectively and more, more quickly, which in itself creates a whole variety of secondary problems.
[00:44:57] Reuven: But ultimately this is, this is [00:45:00] the empowerment of developers to, to do more with less.
In this episode of AI Native Dev, host Simon Maple and guests Maksim Shaposhnikov, Maor Shlomo, and Reuven Cohen explore the evolving landscape of AI agents in software development. They discuss the trade-offs between IDE-based and terminal-first agents, the steps for adopting AI autonomy in development workflows, and the use of subagents to manage complex tasks. Tune in to discover practical insights on maximising AI efficiency while maintaining control and quality in your projects.
From IDE sidekicks to terminal-native power tools, this episode of AI Native Dev dives deep into how developers can work effectively with AI agents today—and how to scale their autonomy responsibly. Host Simon Maple is joined by guests Maksim Shaposhnikov, Maor Shlomo, and Reuven Cohen to unpack real-world workflows, the rise of subagents, and a practical ladder for adopting AI in your day-to-day development.
A central theme in the conversation is the trade-off between IDE-based agents (like Cursor and Windsurf) and terminal-first agents. IDE agents keep the developer “in the loop,” using UI elements to guide, visualise, and accelerate onboarding. With buttons, inline diffs, and file-by-file views, tools like Cursor make it easy to review changes, generate unit tests, and delegate small-to-medium tasks without losing situational awareness. This UI support is especially valuable when you’re new to AI tooling or want to minimise the learning curve.
Terminal-first agents, on the other hand, are designed for developers who live in the command line and prioritise speed, composability, and automation. If you’re comfortable with vim/nano, git, and shell workflows, you can bypass UI overhead and operate purely through commands and keybindings. The downside: verbosity and context can become overwhelming. Without a built-in visual diff or navigable change UI, it takes skill and discipline to track what’s happening and avoid losing important context.
Interestingly, the tooling ecosystem is converging. IDEs like Cursor are adding terminal access and even their own agent modes, while terminal agents now offer more interactive experiences. The line between “GUI-driven assistant” and “CLI-first agent” is blurring, giving teams the flexibility to pick the modality that best fits their task—and switch when needed.
Max outlines a pragmatic adoption path that scales with your experience and risk tolerance. If you’re non-technical or validating ideas quickly, start with no/low-code platforms like Lovable, which let you build frontends and backends via natural language. You stay in control by reviewing and approving the AI’s output, but you don’t need to write code yourself.
For developers, begin with lightweight copilots (e.g., GitHub Copilot) to boost typing speed and fill in boilerplate. Once you’re confident, move to IDE assistants like Cursor or Windsurf where you can delegate larger tasks: refactor an entire file, scaffold integrations, or generate tests across a module. Here, you’re still very much in the loop—reviewing diffs and iterating conversationally—but you’re expanding the scope of delegation.
Finally, when you understand the strengths and limitations of these tools, step into terminal-first agents and CLIs such as Claude Code, Genie CLI, or Gemini CLI. These agents can run background jobs, operate in non-interactive modes, and handle multi-step tasks autonomously. By this stage, you’ve built the trust, habits, and guardrails to safely delegate bigger chunks of work—sometimes across an entire service or subsystem. The overarching principle: grow autonomy as you grow trust, in yourself and in the tooling.
The terminal gives you the “batteries” for automation. Many CLIs support non-interactive modes where you pass a goal prompt and let the agent run in the background as a separate process. This makes it natural to wire agents into scripts, tmux panes, or background jobs that can continue while you work elsewhere. For longer tasks, tee logs to files and keep a transcript of agent actions; this becomes invaluable for auditing, debugging, and learning what to prompt next.
Because visibility is limited compared to an IDE, adopt practices that preserve clarity. Always run agents in dedicated branches. Require agents to summarise planned changes before execution, and enforce commit granularity (e.g., one logical change per commit) so you can bisect or roll back safely. Use git diff and patch staging (git add -p) to keep fine-grained control over what lands. When the agent suggests running commands, prefer review-and-confirm flows or “dry run” flags to catch destructive actions early.
Finally, make the terminal your glue. Chain agents with shell utilities, pass artifacts between steps, and use CI to validate outputs. For example: generate code with a CLI agent, run tests/lint in CI, then only allow merging if gates pass. This “agent + automation” pairing turns the terminal into a powerful hub for orchestrating autonomous work without sacrificing quality.
When a task is too large or ambiguous for a single agent, subagents shine. Think of them as worker instances that the main agent (the orchestrator) spawns to tackle parts of the job or explore multiple options in parallel. Two scenarios stand out: divide-and-conquer (e.g., split a feature into backend and frontend subagents that iterate independently, then reconcile) and parallel exploration (e.g., multiple search strategies—Google queries, Wikipedia, internal docs—converging toward an answer).
This pattern boosts throughput and reduces latency, as the workers canvass the problem space simultaneously. The orchestrator evaluates their outputs, chooses winners, and routes feedback for further refinement. You can let the main agent autonomously spawn workers via prompt instructions (some tools, like Claude Code, can interpret “use parallel agents” natively), or you can predefine specialised subagents with tailored system prompts and tool access for higher quality.
There are trade-offs. Costs scale roughly linearly with the number of workers, and quality isn’t guaranteed—subagents might fail to converge or duplicate effort. Moreover, designing effective subagent instructions is its own craft. General-purpose workers are fast to spin up but may miss crucial constraints; specialised workers demand more upfront prompt engineering but tend to produce higher-quality results. As foundation models improve, the need for multiple workers may decrease, but the orchestration pattern remains valuable for speed and robustness today.
Running many agents in parallel can burn tokens quickly, especially if each worker performs long tool-augmented traces. Control blast radius with explicit scope: define the goal, constraints, allowed files, and stop conditions. Impose timeouts and step limits. Require subagents to propose a plan before execution and to summarise deltas at each checkpoint to reduce meandering.
Convergence is a real concern. Encourage structured decision-making: have subagents produce rationale alongside outputs, let the orchestrator score proposals against acceptance criteria, and iterate only on the most promising branches. Where appropriate, introduce a human-in-the-loop review before expensive steps (migrations, destructive ops) or before merging code. This keeps costs predictable and maintains quality.
Finally, choose your spawning strategy wisely. If you need breadth fast, allow the main agent to spin up generic workers (“parallel agents”) and pick the best outcome. If quality matters most—say, a security-sensitive backend—preconfigure specialised subagents with domain-specific instructions, tools, and test harnesses. The right balance depends on your tolerance for cost, latency, and risk.

Slack Bet
on Context
Engineering
30 Sept 2025
with Samuel Messing

Agents Explained:
Beginner To Pro
28 Oct 2025
with Maksim Shaposhnikov

12 Nov 2025
with Baruch Sadogursky, Liran Tal, Alex Gavrilescu, Josh Long