Join us for DevCon Fall conference, virtual and in New York. Nov 18 - 19Join us for DevCon. Nov 18 - 19
Logo
  • #

    +

    >

    DevCon
  • Articles112
  • Podcast81
  • Landscape464
  • Events26
  • Newsletter28
  • #

    +

    >

    DevCon
  • Articles112
  • Podcast81
  • Landscape464
  • Events26
  • Newsletter28

Stay ahead, join 7,000 devs reading AI Native Dev Digest

Logo
  • Discord
  • LinkedIn
  • X
  • YouTube
  • Spotify
  • Apple Podcasts
  • Home
  • Articles
  • Podcast
  • Landscape
  • About
  • Privacy Policy
  • Cookies
  • Contact
© AI Native Dev

Agents Explained:

Beginner To Pro

Maksim Shaposhnikov
AI Research Engineer, Tessl
Back to podcasts

AI Agents Beyond Context Limits

28 Oct 2025with Maksim Shaposhnikov

Also available on

  • YouTube
  • Apple Podcasts
  • Spotify

Transcript

[00:00:00] Simon: Before we jump into this episode, I wanted to let you know that this podcast is for developers building with AI at the core. So whether that's exploring the latest tools, the workflows, or the best practices, this podcast is for you. A really quick ask, 90% of people who are listening to this haven't yet subscribed. So if this content has helped you build smarter, hit that subscribe button and maybe a like. Alright, back to the episode.

[00:00:24] Simon: Hello and welcome to another episode of the AI Native Dev. I'm super excited about this episode. We're gonna go, we're actually gonna talk about agents, and we're gonna do kind of like a little 101 with agents.

[00:00:43] Simon: I know there are a ton of people who have absolutely gone deep with agents, tried every single agent, but actually, do you know what? There are a huge number of developers, as we found out at various conferences recently, that when we talk about agents, we actually need to do more content, which is more of that 101.

[00:01:01] Simon: Let's talk about why agents are useful. What are the constituent parts of agents? How can we best use them? What are they good for? What are they not? And also what are the differences between agents and other types of AI, you know, pieces like an assistant or perhaps a bot or something like that.

[00:01:20] Simon: Joining me today is Maksim Shaposhnikov, who is gonna be talking about all of these wonderful Agent 101s. Max is a research engineer at Tessl. Max, welcome to the podcast. How are you?

[00:01:33] Max: Hi. Thank you very much for this great introduction. Yeah, happy to be there and happy to talk about Agents 101 and share some insights, some details about that. So, yeah, let's dive in.

[00:01:43] Simon: And this is the kind of thing, Max is a very humble but very, very smart person. So while we're gonna try and fit this into an episode, this may be part one of seven or eight, we will see. But Max, why don't we actually, what, Max, why don't you tell us a little bit about yourself first, what you've been working on, what your passions are in the AI space.

[00:02:03] Max: Yeah, so basically like, it's my first experience working with Codegen here at Tessl. I work as a research engineer, as I mentioned. Our focus here is on making agents reliable and being able to produce robust code such that whenever your agents develop something, you can be confident and sure that it develops something good, something which you can trust, and you don't need to, you know, spend hours verifying the results.

[00:02:26] Max: So that's the thing which we're building here at Tessl. Beyond that, my previous experience was a lot about pre-training of the LLMs. I worked at big tech, and again, I was in the pre-training team in the foundational LLM team where we focused on figuring out how to train large-scale models that could be later fine-tuned for specific tasks.

[00:02:49] Max: And again, coding is one such task. I didn't work with that before Tessl. My specialization was natural language processing, text-to-speech, and other more common multi-modalities.

[00:03:01] Simon: Interesting. Very cool. And I think, you know, when we think about agents, there's a bunch that's kind of, you know, similar across irrespective of what the agent's trying to do or what it's trying to, whether it's trying to generate code or not.

[00:03:11] Simon: And let's kind of, first of all, position an agent with other AI entities. So I mentioned at the start, you know, we have assistants, and that's really where, when we think about AI co-generation, yeah, typically the early-stage copilots of the year were very much assistants to us. Yeah. What makes the difference between an agent, an assistant, and even a bot?

[00:03:35] Max: I think the right answer would be like, there are a couple of these types of entities, as you mentioned, and actually, I never thought about the copilot thing, but I think it's a fourth such modality. So in my picture, there are bots, copilots, AI systems, and AI agents. So that's like a more complete picture.

[00:03:57] Max: I would say bots are the oldest type in this space. Usually, they're automating super simple scripted tasks. They don't include multi-turn conversations. They have a predetermined dialogue tree or dialogue flow, and you know and you can forecast and understand in advance which position or which node the bot will be in.

[00:04:21] Max: So you script all the logic. In some specific cases, there might be an ML model applied, but again, they're super simple, like classification tasks, named entity recognition, and that's it.

[00:04:36] Simon: We're talking about typical bots, website bots that love to hate.

[00:04:40] Simon: Yeah. Yeah, I get you.

[00:04:41] Max: eah. At some point, I think that trend about building bots started since Alexa, maybe like 2015, right? When the first version of Alexa was released, then Siri was released. So there was a lot of hope that bots would be the future. Well, apparently we're still not there.

[00:04:58] Max: Bots are still not able to do lots of, you know, cool stuff. Again, they're super simple in their nature. They're scripted, they have specific patterns about how to perform specific tasks, and they're very limited. They don't have an environment. And environment is a very key, important word, which we'll talk about later.

[00:05:18] Max: So moving next, I think that the next in this hierarchy is the copilot. Copilots are like auto-complete. Usually, we treat them as auto-complete models, which help you to assist in finishing the specific context which you are working with. So in terms of code, imagine you are typing some function.

[00:05:40] Max: It gathers the local context, for example, the previous information which is already in the file, the future information, and that's already enough for it to just complete a small piece in the middle for the function, which will work. Again, the scope is very small, it's super local. And the key feature for copilot models is that they should be instant, super fast, such that, you know, you can press tab and it works like an auto-complete.

[00:06:05] Simon: And actually, when we were talking in the early stages of our podcast, when we were talking with people like TabNine in the early days, they were saying some super interesting things about how the speed at which an auto-complete, basically a co-pilot, like you say, the speed at which that delivered an answer massively, like talking split seconds, massively changed the user, the developer experience with those, whether people would want to use them or not.

[00:06:30] Simon: And typically it's because they provide small responses. They're short responses that I as an individual know what I should expect and know what I could actually type myself most likely, 'cause I'm not asking it to do something huge, it's just a single line or something that is very often based on the very local context of the method, etc.

[00:06:51] Max: Yeah, I think in the early days, these co-pilot models, the requirement for them to be super fast was extremely important because they were just not able to deliver complex stuff, and that's why people were very picky and serious about the speed of it. Now there's a relaxation of this requirement, and I think people are ready to wait more if these copilot models could, you know, achieve or really execute a very complex scenario.

[00:07:09] Max: I can think of an actual scenario where you are defining, let's say, a class and you are writing a docstring for a class. So the docstring can be quite complex, right? It can include a lot of details about your intentions, how this class will look like, what it'll do.

[00:07:31] Max: And I think now that agents, sorry, not agents, assistants, copilots are so smart, people are okay to wait a couple of more seconds, and the tolerance to wait is now growing. That's literally fine because their capability is just growing up. That's our co-pilots. Again, their core feature

[00:07:53] Max: Usually they're small, they're instant in response. The next generation of the models is AI assistants, and this is the first moment, I think, where the agentic capabilities are already presented, maybe not in the full picture, but still, a lot of them are already there. So their purpose is to assist with user tasks.

[00:08:15] Max: And this is the first moment where user tasks can be actually anything. It can be code generation, but it can be planning, it can be searching for and doing them, doing the market research, like the purpose of the assistant can be different. The core capabilities for these assistants are to respond to the user request.

[00:08:35] Max: So like, as the word “assistant” means, like, it waits for your, for your response. It outputs you, like, result of execution and it further waits for another clarification or another command, so it iterates with, it interacts with you in a chat, chat way. It doesn't generate like a long plan.

[00:08:58] Max: It doesn't focus on long horizon tasks. It's focused specifically on what to do, what you want to do right now. And I think like, another important detail is that user makes the decisions. So the user is always in the loop, and that's what is the differentiating factor between the agent and the system.

[00:09:15] Max: And nowadays, I think like, yes, definitely, like AI agents are growing and becoming massively adopted, but still in practice people are mixing the term AI agent with AI assistant. And if you are using a core store with a top interactive tab open on the right, then you are using an AI assistant because you are always there to chat with a model and fix it.

[00:09:38] Max: And even if you are using Claude Code, which is technically an AI agent, you're still very likely using it as an assistant because you are typing commands, you validate the results, and if you're unhappy, then you ask it to redo something.

[00:09:50] Simon: So, so would you say they are, they have that agentic capability, but typically the way we use them is as an assistant.

[00:09:58] Simon: So the difference between an agent or an assistant is almost the way we use it versus capabilities, maybe different versions, or not versions, but different implementations and yeah. And offerings will be better at certain things versus other, whether it's, yeah, the longer term, more autonomous tasks versus the other, but it's typically like how we interact.

[00:10:23] Max: Correct. So I think if we want to highlight the border, then like assistance there again is interacting with the user, waiting for the response. The user makes the decisions, but for AI agents, usually they can be treated as proactive tasks, like systems, which can do like long horizon, very challenging, multi-step complex tasks.

[00:10:47] Max: Sometimes not requiring a human in the loop. That's the definition of the AI agent, at least in this taxonomy which we introduced of four core entities. AI agent is on the top because AI agent can technically not require a human to be there to achieve the task.

[00:11:05] Simon: And let's go into agents a little bit deeper in terms of the way we use them.

[00:11:10] Simon: You mentioned Cursor, you mentioned Claude Code. Very different usage patterns in terms of one uses a terminal UI, one uses a more classical IDE. If we look at others, in fact, quite a few have almost favored either IDE or terminal. But there are a surprising number that favor terminal these days.

[00:11:31] Simon: Is there a best way to use an agent? Is there a way that is like, easier for us to give an agent information or context, or is it really up to the user depending on how the user prefers to interact with the agent?

[00:11:46] Max: Yeah, so I think the interactivity is the differentiating factor. Like when you're using an IDE with an extension, like a Cursor, being visible to you, then you are like agreeing to the scenario where you are in the loop, you are always checking what happens and you are just doing this interactively.

[00:11:59] Max: [00:12:00] So UI elements like simplify, and, you know, like accelerate the onboarding. For you as a developer, it doesn't replace you as a developer. It just exists here on the side to help you with achieving like specific tasks, like executing, like helping you to write a specific code file or specific unit test and so on, and be very helpful in a sense that there is a UI.

[00:12:28] Max: You don't need to know any commands because there are buttons and it's just like a box for you to type the text and execute. So, I think like the cool, part of the Cursor or Windsurf is that they are giving you this IDE with,

[00:12:52] Max: lots of UI elements to help you onboard to using these tools more like proactively in your everyday use case. While for AI agents, like with terminal by definition, like you don't need buttons, right? Like complex things can be done just with a combination of the keys or the commands.

[00:13:16] Max: So terminal is like for more advanced users, for those who are already advanced, to accelerate even further because they don't need to have a visualization. They are okay just like living in the terminal and manipulating all the commands in the terminal because if you're at some point advanced with Vim or Nano or whatever editor, or at Git, you actually don't need UIs, right?

[00:13:40] Max: You can just do very hard scenarios with a terminal directly and it saves you time. But again, like it's just a bit harder because it's less interactive, because it's easier to lose track of what happens. And I think the amount of generated information sometimes may be overwhelming, so it's hard to navigate for the terminal just because so many things are happening, and it requires some skill to learn how to use it properly.

[00:14:07] Simon: Yeah, absolutely. And I think things like Cursor, for example, where you see the changes and you can flick from file to file very, very easily. You obviously miss that from the terminal point of view. But I don't know what it is about, you know, just being a terminal, that actually just makes me feel faster, more productive at the terminal.

[00:14:24] Simon: Right? Yeah. And of course, when you're at the terminal, you get the power of the terminal, right? Tell us a little bit about how you can effectively use the terminal beyond just crude code.

[00:14:36] Max: Yeah, that's true. And another super advantage of the terminal solution is that like, it gives you a batteries to run your agent, in the background, right?

[00:14:46] Max: Like you can run Claude Code, Codex, Gemini CLI, in the non-interactive mode, just for a CLI given a prompt of what you want from this agent to do and it will spawn a background process and fully on its own start executing this complex request which we asked it. But I think nowadays actually, like any AI system tools, like such as Cursor, are also moving in this direction and they are adding more and more capabilities.

[00:15:14] Max: For example, a couple of months ago, Cursor couldn't have access to a terminal. So if you run something, you needed to copy and paste directly in the text box of Cursor or the error message, for example. But now it actually has access to a terminal, so it has access to the information which you are in a terminal, like using. That's one thing.

[00:15:35] Max: Others, Cursor also launched their own agent, which can also be non-interactive or CLI. So they are covering this space as well. They want to get both sides and both types of people, like who want to interact, who want more interactivity, and who are okay to work just with CLI.

[00:15:59] Simon: It's funny how I think we are fairly familiar, you and I, with trying out a bunch of these and I think everything that we've mentioned so far, I have definitely played with, I'm sure you have, and there's obviously a bunch of others. Gemini CLI, you actually maybe mentioned Gemini. Codex is another.

[00:16:13] Simon: Where would you say, like for those listening who are kind of on the, you know, trying to follow this 101, where do I actually start? Where would you say it's a good place to start? Is there one good place, or is it really depending on where users are most consistently, you know, doing those tasks? Is it better they try and find an agent in that environment?

[00:16:35] Simon: What, what would you recommend?

[00:16:37] Max: I think it depends on the level of experience and maybe on the role of the person in the company if we're talking about some technical people, because definitely there are no-code solutions, right? Like Lovable, for example, that Junior that provides a vibe coding platform for developing backends and front ends instantly without any knowledge of code.

[00:16:58] Max: So it requires zero knowledge. It just requires you to be there to validate and approve the content generated by the AI assistant or whatever. And the closer you are to the code, and the more scale you need, then the more advanced tools you need to use, which potentially require a lot of manual resolution of the problems.

[00:17:22] Max: So for example, if you are like a software engineer and your goal is to develop a lot of software tasks and a lot of manipulations with the code, then you are okay to start with, first of all, a Copilot just to see if the Copilot model is helping you to at least accelerate your productivity as a developer.

[00:17:41] Max: So does it help you to write better, faster code? If you're happy with the Copilot, then you're going to the next extension and go with a Cursor IDE when you are starting to use these complex assistance as your companions helping you achieve the task and now the scope of autonomy and what assistance can do grows.

[00:18:05] Max: So you can delegate it to change not just the function but entire files. And once you are okay with that, then you can move to the top of this hierarchy and start using Claude Code and Cursor or Gemini CLI directly in the terminal because now you know the strong sides of this.

[00:18:28] Max: Now you know what you can delegate it and are probably already familiar with core capabilities and familiar with how these tools are working so you can delegate even more scope in a fully autonomous pipeline or setup. So I think the reasoning should be like that. Start with something super simple, which doesn't require any knowledge of code. And if you already have knowledge, then progress further and go into more and more advanced tools.

[00:18:57] Simon: I think it's really, really good advice in terms of, it's not just actually about, well, there's two things I think here. One, there is a trust implication and it's important to be in control. Like when we start learning how to drive, we don't jump into a Ferrari and try and try and try and drive that Ferrari. It's about taking small steps. Correct. Once we are comfortable with it, once we trust ourselves as well as the tools that we have, we go onto the next step. And this growth in autonomy is all about that growth in trust.

[00:19:26] Simon: Yeah. I absolutely love that. And I like the focus there as well in terms of, you know, sticking to where you are most familiar, in that sense of if you're familiar with an IDE, well maybe let's start there. You know, when you go from there, maybe GitHub Copilot or something like that in the IDE is the best place.

[00:19:44] Simon: Or, and then maybe Cursor, if you want to stay with IDE. If you love terminals, maybe dip into something else from that. Really, really like that advice. Some of our listeners may have heard of Subagent before. What's the difference between an agent and a subagent?

[00:19:58] Max: A subagent is basically an instance, like another instance of an agent. Just spawned and controlled by the main agent. And you usually use that when your task is so complex that one agent is not enough to tackle it and when there are different options or different paths that can be explored. So I think, yeah, two scenarios where the problem is so complex that it needs to be.

[00:20:27] Max: Split into smaller parts or when you need to make a search to like just explore the space and choose one of the options. So like, yeah, these two scenarios. But basically, subagent is just like another instance of the main agent, which is asked to solve a specific problem. So, you can think of it as a master-slave architecture.

[00:20:52] Max: Right? Like if you are, for example, interacting with a master agent, the one which you have access to in your terminal, and you ask it to develop a service that has a front end and a backend, then it's a great candidate for splitting and spawning a subagent. One subagent will be working on the backend, another agent will be working on the front end, and they will be doing their pieces independently.

[00:21:12] Max: At some point, they will go back to the master agent to communicate the results, and the master will decide, are they okay, or is the result fine? Does it satisfy the overall goal?

[00:21:30] Max: And if not, then it will circle back the information to them and give feedback, and they will continue iterating or refining based on the feedback they receive to improve this backend and front end. That's an example of how to use subagent for coding assistance. For tasks beyond coding, where you need exploration, subagents might be just parallel instances of the same agent, exploring the problem from different angles.

[00:21:50] Max: For example, you want to search for some information to answer a specific question. You can make a search with one type of query in Google, you can make a search over the articles in Wikipedia, you can do a search over the information available in your, I dunno, like local database.

[00:22:12] Max: And for all that scenarios, they're all about the search of the information in different sources. But they are done by just like different agents, technically speaking. In some future there won't be need in subagents because the master agent will be so smart and will be knowing how to resolve any challenging problem on its own.

[00:22:36] Max: So it won't, it won't, you know, lose and won't forget about the context. It will know how to navigate in this complex structure on its own, and technically it won't need to have spawning subagents. But another reason for another motivation for the subagent is just like speed.

[00:22:58] Max: We know that agents are super slow if you try to paralyze them. And subagent is one way of doing this. You just get your result faster. You just get like 10 implementations of the same thing and maybe one or two of them are good and working and doing what you are expecting. So it's just like latency and throughput versus idling time.

[00:23:22] Simon: I think like the time that you mentioned there and the parallelization as well as the context of both really good. Reasons to use them. In terms of the cons, the negatives that they can provide. Is it just, is it like burning tokens more or anything? Anything that we need to be cautious of?

[00:23:42] Max: Yeah, I mean, like, for sure like costs will grow linearly, right? Like with more agency will cost, spend more. And sometimes there might be a situation where you like, again, like spawn these old agents, but they actually didn't converge to anything. And another disadvantage is that there are different ways how to spawn, how to launch the subagent.

[00:24:05] Max: One is the, your main agent can launch subagent on its own because like cloud code actually now gives this capability. You can ask explicitly, say explicitly in the prompt, use parallel agents to analyze this complex document, right? So you can explicitly state it in the prompt and it will on its own spawn like multiple versions of the agents.

[00:24:26] Max: But the thing is, you can, you might want to have specialized sub subagents, right? So instead of like general search subagents, you want to have subagent focused on a specific axis or a specific angle of a problem. And here you need to have extra customization, you need to work, you need to carefully design instructions for that subagent in order to get the best, you know, get great result.

[00:24:56] Max: So the downside, speaking of downsides, the time for designing these subagent, it's also like an art and you need to make an effort if you don't. Do it, then probably main agent will do it on its own, but the quality of the subagent might be different because it might just like miss some important bits in instruction to the subagent.

[00:25:22] Simon: And, and in terms of the agents, like other than being able to customize an agent and be able to, a subagent rather and, and be able to say, hey, I want you to use this subagent when you're doing these types of things. Or allow it to choose the appropriate kind of subagent do those tasks. I presume the main agent is pretty much entirely the communicator across subagents, there's no human interaction between that, that bypasses the main agent.

[00:25:51] Max: Yeah, I mean, like, it depends on the design of your system, and I think in Claude code it's very hard to control it.

[00:25:56] Max: Like usually the subagent are launched on the background, so you don't actually have access directly to see what they're doing. You only can see it in the logs. So if you open the code, JSOL file to see what were the commands, you will see that agents was doing something on the background, but you won't necessarily see it in your terminal, you won't necessarily see the thinking process, the path, the planning for each of the sub agents.

[00:26:14] Max: So yeah, like. Again, and maybe another like, I dunno, disadvantage or even advantage of this is that like you trust it even more. So you need to believe in the fact that it'll come up with something meaningful even more if you are entering the world of subagents.

[00:26:38] Simon: Cool. You mentioned context management, which I think is actually a really important part of subagents. Well, first of all, we should talk about the problems of context within agents and when they compact context and how perhaps subagents can alleviate at least some of that by running in their own little domains.

[00:27:00] Simon: Talk us through a little bit about the need and the problems of context or over context for an agent.

[00:27:06] Max: Yeah, so, like behind the scenes of all the agents, they are LLMs, right? Like large language models, and they are trained with some fixed or maximum context length and this context length, one of the main limitations, I'll talk about the other one, but like, it usually has some max size, like 256 K tokens or 1 million tokens.

[00:27:33] Max: So, so there is a limit of how much information can be used in the context of the LLM. And when we like starting talking about agents. Agent is actually a wrapper around your LLM, which constantly adds something, appends something new to the context, right? So imagine you are starting your session and asking Claude code to develop something.

[00:27:58] Max: It creates like you, I dunno, like a bunch of files. And all of this information goes into the context. So it already occupies some thousands of the tokens in the context window. And every time cloud code tries to fix the bug, it actually also like adds, appends this information to the context window.

[00:28:18] Max: So all these trace logs, trace backs, all this goes into the context. And it's crazy how fast you can, you know, like reach the limit of the context memory. And that's one place where subagents can help because different instances of the, like different subagent will have their own context window, and that's how you increase the amount of the available context because every single subagent will maintain its own context window, just prefilled with information it gathered from the main agent about the instruction about the task, which it needed to accomplish, but the rest of the context won't be effective.

[00:29:08] Simon: Which I think when you talked about research, so a subagent that does an amount of research on a topic, if it did a ton of research and actually only 10% of that was valid, the 90% wouldn't actually go back to the main. Yeah. To the main agent, it actually only report that 10%, which it thinks is most valid.

[00:29:25] Simon: Correct. And then the agent, the main agent, can go, right, I've got exactly the information I need, and my context window isn't overwhelmed with nonsense.

[00:29:33] Max: Thanks for clarifying. And making it clearly understandable. So yeah, like out of thousands of tokens, which subagent could generate on this small beat of it.

[00:29:43] Max: Like super, super, like small part. The result will go into the memory of the main agent to proceed with the main task. And the rest of the memory will be cleared. Yeah. Like the capacity, the limit of the memory goes away pretty quickly. That's why we need to have some tools for compressing and they are implemented,

[00:30:03] Max: Like compression tools are implemented in most of the modern agents. But that's actually like a pretty challenging task because it's not always obvious what information is no longer useful and what is actually still valid. Because like you could put in the memory, the information about like initial state of the file, at some point in the beginning of your session, but then during your interaction, you were.

[00:30:33] Max: Like completely token and asking the agent to solve a different problem and then refer back to this information which you put in in the beginning and it could be already outdated because of some other reasons. So agent need to know why this happened and it needs to spend extra tokens to understand that, okay, my context is actually become outdated.

[00:30:53] Max: I need to clear it. It's safe to clean it and remove the information so it does some extra work for making this compression. So compression is a separate, challenging task, I would say.

[00:31:07] Simon: And that problem could happen again and again and again. Every single time you get your context window high enough that it has to really redo that compression.

[00:31:14] Simon: Yeah. Correct. Yeah. Let's talk a little bit about agents and what they're good at and what they're not good at. 'cause I think we've talked a lot about, you know. How they do some of their, some of their things. But, as users, as, let's maybe focus on developers, since this is a developer podcast, let's get, let's say what, you know, what, what, what, I, what should I expect an agent to be really good at most of the time, and actually.

[00:31:39] Simon: Not be surprised if it struggles with other tasks.

[00:31:43] Max: Very, very good question. If you can like, explain in very thorough details the problem which you're trying to solve. So if we're talking about a coding problem, if you write a very careful specification capturing all the necessary aspects of your,

[00:32:00] Max: of your product or your decisions, all the requirements there, you can expect that it'll come up with a prototype that will be at least very close to what you want. So. I mean like, that's already like a surprising, right? Like we are in the era where we can just populate in plain language a huge document describing how something complex should work.

[00:32:21] Max: And agent will come up with a roughly working solution on its own. So this is where we are in terms of what agents can do. There, there are lots of caveats to that. Is that like, usually it won't be exactly what you want and you will need to anyway, like manual interventions and fixes to validate that like.

[00:32:40] Max: Fix it and correct where it made some wrong decisions. So human in the loop still essential. But again, it's not a bug, it's a feature because our requirements are always loose and we are under specifying stuff. And since we underspecified. The LLM has a choice to decide on its own if the content is underspecified, how to proceed with it.

[00:33:04] Max: If you didn't say in your specification which database to use, like Postgres or Mongo, it's up to the LLM to come up with what it thinks. But it's not a problem of an LLM, it's just because you are underspecified, right? So, I would say like, to sum, to sum, to sum up, if you give like a very clean instruction, a very detailed information about how you see your solution should work.

[00:33:27] Max: And what things it should include. Then it's already good enough at instruction following to develop you a prototype which matches your expectations, both in backend and front end. I would say there might be some, again, like limitations of the solution, which it'll come up with, but that can be because,

[00:33:47] Max: you were under specified in the spec or because the task force, I don't know, it, sometimes it can come up with non-efficient, like unefficient solution, which you won't be scalable and you will anyway need to refactor, but usually it comes also because of your loose specification. So yeah, like, for sure, like in one shot, in few shot, you can come up with a working game, which works in your browser.

[00:34:12] Max: You can come up with a server which can be hosted locally and sometimes I think. You can even go beyond that and. Do even a deployment also fully autonomously. That's what at least Lovable offers, right? They like fully maintain the life cycle of the development of a front end and backend and serving this for you.

[00:34:31] Max: And you don't need to, you don't need to make any decisions about this. So agents are that powerful.

[00:34:37] Simon: Yeah, absolutely. And, but let's now talk about. Mentioning deployment. Oh, when we think about something that we can, that we are happy to push to production, something that we as professional developers now want to, want to, want to use and build with.

[00:34:55] Simon: In fact, why don't we, why don't we think first about the actual development environment that we are, that we as professional developers need to use? There's certain things like, you know, security issues. For example, if I'm, if I want to use Claude code locally, you know, it's gonna have full access to my system if I wanna run things locally.

[00:35:11] Simon: Okay. What's happening now? Where do I need to kind of, first of all, think from a, from a professional point of view of locking down my system a little bit more?

[00:35:20] Max: Yeah, absolutely. So like, basically if you are working on your local version of the repo and you are giving a full access to the Claude code to work on it.

[00:35:29] Max: At some point it can do minus rf, accidentally. And if you do, don't do a checkpoints, then you're in trouble. And there are lots of stories on Reddit where cloud code like went that crazy and did this. And in order to, you know, accelerate the professional development experience, there are like different approaches how to do that.

[00:35:49] Max: One. Which you can do on your own, yet you can postpone cloud codes always in the isolated environments. So you can create a separate docker container, which will have a replica of your repo. And you will, you know, send the commands to this docker container. And agent will, in this isolated environment, will be working on a replica.

[00:36:11] Max: And actually that's one branch of solving these issues with the permissions and with the scare of being scared to remove something. So isolated environments. That's where you can solve some of the issues. Another type of problem with security, for example, related to the API keys, for example.

[00:36:33] Max: What if you are asking your agent write another agent. Right. Like for example, you are a company which develops agents for some reason. And to test this agent, it might need to have access to the API token, right? And if you provide this API token, then it can go crazy there because it can start doing infinite tests, basically burning your budget.

[00:36:59] Max: [00:37:00] And this is for these types of issues that's much more hard, like much harder to come up with something, some security solution, unless you cut off the internet completely. But yeah, like that's a type of the security concern. But yeah, like speaking of the main approach, the sandboxing environments, that's what allows you to solve most of the problems.

[00:37:28] Max: Another way of solving these types of problems is like, again, like limiting the available tools for the agents. For example, agent cannot write or edit files under specific directories. So you are, as the user in the loop, can create these rules and this will prohibit agent from making like dangerous actions from a very sensible.

[00:37:56] Max: Since, you know, like libraries or directories where there's a lot of sensitive information and Yeah, like some tools, for example, Gemini CLI comes with a sandbox solution available out of the box. So if you start a Gemini session with a sandbox from your current directory, then it'll make a clone of your repo.

[00:38:14] Max: In the isolated docker container, and you can also specify which commands or which tools are available. So you can disconnect it from the internet, for example. And then, yeah, it'll just like start working in the isolated environment and you don't need to do any manual spawning of the containers, they are all will be given to you because Gemini CLI cared about this, and cared about this experience.

[00:38:40] Max: Yeah, that's one way to look at this.

[00:38:43] Simon: Yeah. Very interesting. And what about memory and memory management? Is this, is this something that's kinda like a similar, similar controls that we would have?

[00:38:52] Max: So basically, uh, some providers, uh, like again, like Gemini CLI also works on the tools which help you to.[00:39:00]

[00:39:00] Max: Have a fine grain control over the memory of the agent. So you can make a checkpoint, you can make a snapshot of the context that was at some point in the memory of the agent. Once you snapshot it, you can finish the session. You can close the terminal and you can restart it the other day and resume from a specific context.

[00:39:21] Max: From a specific, you know, point in, in time where you were developing, which I think is super valuable, right?

[00:39:26] Simon: Because, particularly when you're thinking about professional development environments, yeah, you want them to work in certain ways. Correct. And it's about retaining that information.

[00:39:39] Simon: So I don't need to say again and again and again, like, you know, you're speaking to a child, "Don't do this, do this, don't do this, do this."

[00:39:46] Max: Sometimes, like, you are also, as a developer, not sure about the solution which you want to see in the end. So you are communicating and chatting with an agent to propose multiple implementations, right?

[00:39:57] Max: And then you snapshot each of these. At some point in the future, you might want to come back to a specific one which you like the most, and you want to revert to context to the specific point in time, to the approach which you liked. So that's, yeah, it just makes your environment clean and clean and easy to control.

[00:40:19] Max: So it is no longer messy. You don't need to remember where you committed your changes to a specific branch, such that you never lose the information which the agent already generated. You just eliminate from your brain context some information which is maybe not useful because you have access to explicitly store snapshots of the memory.

[00:40:44] Max: And yeah, that's a super cool feature, especially for professional development developers who are working with many branches and many features at scale.

[00:40:53] Simon: And working within an organization as well, where there's probably a bunch of shared knowledge, and, I guess, you know, perhaps there are organizational policies or organizational styles, ways of working, things that the platform team want you to do or don't want you to do. Can we allow this knowledge to be shared through context management as well? Organizational context management?

[00:41:20] Max: Yeah. So I think most of the tools are focusing on the context which is local, which is available in the local environment. For example, you can for sure have a Gemini MD or Claude MD or agents MD file where you explicitly say and refer to other markdown files or other documentation.

[00:41:46] Max: Such that when agents are started or triggered, they'll have all the information about your security guidelines, the code guidelines, and so on. The problem with that is that, again, if you overload the context with so many instructions, the agent can just be confused and get lost in this terabytes-like flow of information.

[00:42:09] Max: Plus, sometimes the instructions can be overwhelming and conflicting, and it's unclear what to do in this case. So I would say that's a great, not just a research topic, but a product topic. And many companies will try to solve this for sure, because how to introduce the tribal knowledge of your company and knowledge about the security guidelines efficiently, the coding guidelines, and so on efficiently, such that your agent could benefit and use it all the time when you're doing coding and never forgets about this and always respects it. That's an open problem. I don't think any of the existing providers or vendors of the agents solve this explicitly, but there are definitely approaches for that.

[00:43:00] Max: For example, Gemini CLI gives you access to directly write some memories into the agent's memory. I think it's called slash memory update. And a prompt will append your prompt to the Gemini MD and refresh the Gemini MD in the memory, in the context memory of the agent. So that's one way to introduce super recent information directly into the context. Or Claude code introduced the concept of skills, where you have a bunch of independent things, like skills, which can be dynamically fetched in the context memory depending on the situation. So that's a way these are the approaches from the vendors to tackle this complex problem of respecting your specific choices and decisions.

[00:43:52] Simon: And, of course, Tessl are kind of leaning into this space as well, in and around context management of holding organizational data a little bit, as well as various open-source kind of usage specs, I guess, as well as proprietary. Tell us a little bit about how that can level up your agents in a more professional environment.

[00:44:14] Max: Yeah, so the concept of the usage specs is easiest to explain in the example of third-party dependencies. So imagine you are working with a library with some framework, and it has many dependencies. And if you want to implement in your library a feature related to this dependency, then you need to make sure that the agent, the code agent, has lots of capabilities and lots of knowledge about this dependency.

[00:44:48] Max: But imagine this is not a very popular library. Imagine that's something which was just released recently. It was not a part of the training data for the Claude code or for any foundational LLM. So LLM just doesn't know how the API of this dependency looks. And the only way to solve it is either to have access to the source code, which is super inefficient, right?

[00:45:12] Max: Because it'll require the agent to clone the repo to navigate through thousands of the files and so on. Or it can just try to import this dependency and try to test the functionality available in this repo in the DAMI script, which again, is also inefficient in terms of turns, right?

[00:45:33] Max: The agent will spend lots of time just understanding how to work with this library. So why would you do that? No reasons. You want to be efficient, right? You can use usage specs. Usage specs is a compressed representation of the API of this library, which can be consumed by your agent from the Tessl registry or from your private registry, because Tessl is also working on having registries for companies.

[00:46:05] Max: And if you consume these usage specs, it's just like a bunch of documents which show how the API is designed for this specific package. Then the agent has direct access to very clear knowledge of how to work with this specific library. It doesn't need to scan thousands of files.

[00:46:30] Max: It has everything stored in a very nicely formatted and easily accessible way for you, directly here. So it's very token-efficient, turn-efficient, and working, because we made a big effort to make these usage specs valuable and containing lots of information. Usage specs not only show the API methods and the definitions, but also how to use them.

[00:46:58] Max: What are the best scenarios? What are [00:47:00] the underlying implications of using this specific, API? So that's one. You know, way how to look into the usage specs, as a way to learn, to, to, to, to give, you know, to, to give your a agent a knowledge about. Some library, which was never part of the pre-training data, or which is not popular, or which is a private right.

[00:47:25] Max: Which was, you know, like agent does, just doesn't know about this.

[00:47:28] Simon: Yep. Awesome. Let's wrap this section up. By talking about evals. 'cause I think when we talk about professional development, we need to know, we need to be able to test, we need to be able to say, okay, these agents are best because they can do these particular things really well.

[00:47:45] Simon: I'll use these other agents because they can do these other things particularly well. Maybe reviews or tests or whatever. How do we, what's a good way to be able to eval an agent's capability as, let's say, code generation?

[00:47:58] Max: So it's a huge topic, because there are so many.

[00:48:02] Simon: This is, this is a podcast episode by itself, but let's try and cover it in the, in five minutes.

[00:48:07] Max: So yeah, eval is just like super complex topic on its own. Like it's a very complex, like how to eval LLMs in general, like, because they can do literally so many things. But how do you eval this? And it becomes more and more challenging even if you try to limit the scope just to code generation.

[00:48:24] Max: Because apparently code generation is also like a super complex. You know, task and evaluating the capabilities of the agent specifically in this domain is also challenging. So I would say the most common way to do this now in the community and in the research air environment and the in academia, is to use like popular benchmarks such as SWE-bench.

[00:48:49] Max: Specifically a verified variant of it, which was popularized by OpenAI. So the way these evals are working, in a sense, in some sense, they are reflecting [00:49:00] their real development process. They are taking your repository in the state before some feature was implemented. Then it asks you to implement this specific feature, right?

[00:49:12] Max: It gives you a report before the feature was there. It gives you a requirements about what, how the feature should look like. It gives you some, maybe information about the API, and how, how it's intended to be implemented. Then you launch your agent, your coding assistant, which does the work to implement the missing files, the the, the missing feature.

[00:49:34] Max: And once it generated this patch, like this update. It is evaluated against a commit, which has real tests, which has real unit tests approved by the human. So, which we can treat as a ground truth. And if your agent proposed patch passes this test, then you can think of it as a success and who, your agent implemented everything properly.

[00:49:58] Max: So it works as. So, that's the most common way how to, how to test it. Of course it's very hard to design such a benchmark, right? Like, because, in reality you can't just go to GitHub and clone any package, pick a PR, and use this PR as a, you know, like a source for your evaluation because many prs are lacking of documentation.

[00:50:27] Max: They don't declare any, any information about like design choices and so on. So you need to filter the data very aggressively to just include those data points where your agent has theoretically. Like a chance to solve this problem, right? Like, because, your, your PR description, the issue description should be enough.

[00:50:48] Max: It should contain enough information for the agent to solve the problem. And if it misses the definitions of the API, if it misses some assumptions, that agent just like theoretically doesn't have chances to come up with, with the right solution. So, SWE-bench alike, evaluations is the most, common approach, but.

[00:51:08] Max: Again, like it's very limited. If you look into, into the SWE-bench Pro, tasks, then they are usually touching only one file. Rarely they touch two files. And again, like that's not real software development experience, right? Like usually when you ship a feature, you're shipping like. Five or six files, you are making changes in many places.

[00:51:30] Max: And you need to be super accurate in imports, in unit tests and so on. So the real prs are much more challenging than what we have in, as we bench pro. So that's why we need, like, a new benchmark. It can be, it can be of the same format, but it needs to be a next level. Requiring the agent to make changes, not in one file, but in a collection of files.

[00:51:54] Simon: So more of a guideline or a signal versus something that's more [00:52:00] complete in terms of will, will absolutely relate to your exact experience.

[00:52:04] Max: Exactly. And also these type of evolve, I think it focuses mainly on the software engineering backend tasks. But what if we go to the, agents, which are, which can produce, UI, right? Like how do you evaluate the UI? Or what if you are developing an agent which interacts with a database and can change the state of the database, right? Like it's a different type of eval and it's, it's a different type of a complexity. And community is now working on a specific, like dedicated benchmarks for assessing these capabilities as well.

[00:52:38] Max: Like if your agent generated a UI, how you evolved this. Like matching the, the request in the, in the user prompt. How, like, how can we trust that what it came up with actually matches the expectations? Yeah. So that's like a pick of an iceberg.

[00:52:59] Simon: Absolutely. Let's, let's wrap up with, I guess what's next for agents or what, what, what are the, what are the things that you're most looking forward to in whatever this next generation of agents, AI agents looks like?

[00:53:11] Max: And one of my expectations is that like agents should at some point learn how to write a better code because, now nowadays the quality of the code they produce, sometimes it's like, just like a garbage. It's like a, you can feel that it's a one time script. It's really not something which you will enjoy, to come back at some point, like later in the timeline and work on top of that.

[00:53:38] Max: A lot of the code, which LLM generates is, that's super garbage, messy, not scalable. So, And of course you can deal with it by having more guidelines about how to write the code. But still, I think it's an open challenge, to improve the quality of the generated code, in general for the agent. So that's my one expectations.

[00:54:01] Max: Second is, of course, like improving the UI capabilities. So like most of the existing agents, they're struggling. With creating, really nice UIs, following your specific instructions. So I expect the core models to get improvement over there and, I think. I would love to see, like we talked a little bit about the SWE-bench eval.

[00:54:27] Max: So I want to see, a new version of the, a more challenging version of SWE bench and people are working on that. For example, SWE Pro, that's a much more complex and challenging variant of the SWE-bench where agent is just. facing harder tasks, which again, like require it to make changes in multiple files and accuracy falls dramatically, which makes all the claims that agents can execute long horizon tasks.

[00:54:57] Max: Kind of not valid. [00:55:00]

[00:55:01] Max: So, really having very complex PRs, having, making agents, solving really complex tasks which require changes in multiple files. This is the way to. make it able to solve long, long horizon tasks. So I'm expecting, to again, like have more, more of an input from the academia, from the people and investment in the better benchmarks about how, how can we trust, how can we make sure that it is able to generate complex code, but also like improving the, of course, definitely the, the accuracy on these benchmarks.

[00:55:36] Simon: And, and that's a great way to, to, to wrap up with a, tell me I'm a research engineer without telling me I'm a, without saying, I'm a research engineer with talking about evals. So, so Max, this has been, this has been extremely, interesting and thorough. Really, really appreciate all your knowledge and experience here.

[00:55:55] Simon: In chatting about agents. I'd love to kind of hear what our audience thinks in terms of, more sessions like this, more of the kind of the 101. Let's go, let's go deeper into certain agents and other AI technologies. We can absolutely do that. So let us know. Max, massive thank you. Thank you for joining us, and thank you for sharing.

[00:56:13] Max: Yeah, thank you very much for having me. And yeah, I'll be happy to come back and talk more about.

[00:56:17] Simon: Absolutely, absolutely. This is 1 of 20, in a series that we're doing now, Max, so that's fun.

Max: Amazing.

Simon: Thanks very much everyone, really appreciate you joining. And, tune in for the next episode.

AI-Native Development
Agentic Systems
Code Generation
AI Tools & Assistants

Chapters

Trailer
[00:00:00]
Introduction
[00:00:53]
Deep Dive into Agents 101
[00:01:18]
Understanding AI Entities: Bots, Copilots, Assistants, and Agents
[00:03:56]
Understanding Context Limitations in LLMs
[00:28:17]
Capabilities and Limitations of AI Agents
[00:32:22]
Security, Memory Management, and Future of AI Agents
[00:36:02]

In this episode

In this episode of AI Native Dev, host Simon Maple and Tessl research engineer Maksim Shaposhnikov explore the evolving landscape of software "agents" and how they differ from bots, copilots, and assistants. They delve into practical strategies for building reliable agentic code generation, offering insights on using the right tools for specific tasks, balancing speed and capability, and designing environments that foster safe, autonomous execution.

In this Agents 101 episode of AI Native Dev, host Simon Maple sits down with Maksim Shaposhnikov, a research engineer at Tessl, to demystify what “agents” really are, how they differ from assistants, copilots, and bots, and how developers can use them effectively. Max brings a background in large-scale LLM pretraining and now focuses on making agentic code generation reliable—so developers can trust the output without sinking hours into manual verification. The discussion delivers a practical taxonomy, clear usage patterns (IDE vs terminal), and hard-won guidance on reliability, latency, and long-horizon task execution.

A Practical Taxonomy: Bots, Copilots, Assistants, and Agents

Max frames the landscape with four clear categories. Bots are the oldest and simplest: they automate narrow, scripted tasks using predetermined dialog flows or trees. If machine learning is involved, it’s usually lightweight—think basic classification or named entity recognition. Critically, bots don’t have a meaningful “environment” to act in; they don’t explore or make decisions beyond the scripted path, which is why they’re often underwhelming for complex workflows.

Copilots are best understood as ultra-fast autocomplete. They operate on local context (e.g., the current file, nearby functions, and sometimes future tokens in the buffer), producing small, targeted completions. Historically, speed was everything—developers tolerated only split-second latency. As capabilities have grown, developers are now more willing to wait a bit longer if the copilot can synthesize something more complex, like building out a method or filling in a rich docstring-guided class skeleton.

Assistants bring multi-turn chat into the mix and can span a wide range of tasks—coding, research, planning—but the user is firmly in the loop. They respond to instructions, await clarifications, and focus on the “now” task rather than long-horizon execution. Agents, by contrast, are designed to act with autonomy. They can plan and execute multi-step, long-horizon tasks, sometimes without a human in the loop. Practically, many “agents” are still used like assistants today (e.g., Claude Code inside an IDE) because developers want to validate outputs and intervene rapidly. The distinction is less about raw capability and more about how you let the system operate.

Building Agents You Can Trust: Reliability and Robust Codegen

Max’s current work at Tessl centers on reliability: making sure agent-generated code is robust so developers don’t spend hours validating output. Reliability is especially hard in agentic settings because the system must coordinate multiple steps, tools, and files. If you’re building or integrating an agent, treat verification as a first-class concern rather than an afterthought.

In practice, reliability comes from structured execution and feedback loops. Keep tasks bounded but multi-step: give the agent specific milestones (e.g., “create module + unit tests + run linter + run tests”) and ensure it can observe results. For codegen, always pair generation with automatic checks—formatters, linters, type-checkers, and test runs. Treat the file system and shell as your agent’s environment for verification, not just for editing. Require diff previews or commit gates where the agent proposes a change, runs checks, and you approve before commit. The objective is to minimize blind trust while preserving flow.

Observability also matters. Even when using an agent “as an assistant,” capture logs, diffs, and command outputs so you can diagnose where things went wrong. Agents can overwhelm you with output; favor structured summaries and artifacts (e.g., a test report and a change summary) so human validation stays fast. Over time, this discipline lets you expand autonomy from “assistive” to “agentic” with confidence.

IDE or Terminal? Picking the Right Control Surface for Agentic Dev

The choice between IDE-based agents (e.g., Cursor, Windsurf, Claude Code inside your editor) and terminal-driven agents (e.g., Codex, Gemini CLI, Claude Code via CLI) comes down to interactivity and control. IDEs excel at onboarding and visibility: you see file diffs, inline annotations, and contextual suggestions. Buttons and panels translate into lower cognitive overhead. If you’re still keeping a human-in-the-loop, an IDE makes it trivial to validate, tweak, and keep track of multi-file changes.

Terminals favor power users. You don’t need UI chrome when you can combine commands, pipe outputs, and script everything. The big advantage is that you can run agents in non-interactive mode and spawn background processes that tackle long-horizon tasks while you do other work. The trade-off is that it’s easier to lose the plot among logs and output streams. If you choose the terminal route, treat readability as part of the system design: write outputs to files, use consistent prefixes/timestamps, and provide concise summaries so you can grep or tail your way into the signal quickly.

A practical pattern is to start interactively in an IDE to validate an agent’s approach on a small task, then graduate that same workflow to a CLI command for background execution once it proves itself. This preserves tight feedback during design and gives you scale and speed once you trust the flow.

Speed vs Capability: Designing for Latency Across Copilots and Assistants

Early copilots lived or died by latency because they were limited to single-line or small-block completions. Today’s tools can handle more complexity, and developers will tolerate a few extra seconds if the output quality jumps—especially when you provide detailed intent via docstrings or comments. The key is to align latency with task scope: use an instant “fast path” for inline completions and a deliberate “slow path” for heavier requests like scaffolding a class, authoring tests, or refactoring across files.

Assistants and agents both benefit from explicit task scoping. Keep assistant requests focused and immediately verifiable to maintain conversational flow. When you need long-horizon behavior—such as installing dependencies, creating files, and running tests—hand that to an agent mode that can plan, execute, and self-check. Make the transition explicit in your UI or CLI so users know when to expect near-instant completions versus multi-step execution with longer runtimes.

Where possible, set expectations in the interface. Show “thinking” or “executing” phases with summaries of planned steps. For terminal users, emit a plan header (steps, tools to use), stream concise progress, and conclude with a result summary and next actions. These small touches keep trust high even when tasks take longer.

Environments and Autonomy: What Elevates an Agent

“Environment” is the pivotal concept that separates bots from agents. Bots follow scripts; agents operate inside an environment where they can observe, act, and evaluate. For developer workflows, the environment often includes the file system, shell, package manager, test runner, VCS, and sometimes internet-accessible APIs. Giving the agent these tools is what enables long-horizon, multi-step work.

But autonomy without boundaries invites risk. Start with guardrails: read-only exploration before writes; diff previews for changes; approval checkpoints before running destructive commands; and controlled network access if applicable. Implement graduated autonomy—begin as an assistant, then allow the agent to execute a small subset of actions automatically, and expand from there. Maintain the option to pause, inspect, and roll back (e.g., via git). This balances velocity with safety.

Finally, match the interaction model to the job. Use assistants for quick, high-precision tasks where the human decides next steps. Switch to agents for proactive, background execution that can plan, act, and verify without constant supervision. With thoughtful environment design, clear checkpoints, and robust verification, you can let agents do more while staying confident in the results.

Key Takeaways

  • Use the right tool class: bots for scripted flows, copilots for instant local completions, assistants for chat-based, user-driven tasks, and agents for autonomous, long-horizon execution.
  • Design for reliability: pair code generation with automatic checks (formatters, linters, type-checkers, tests), require diff previews, and keep observability artifacts (logs, reports) for easy validation.
  • Choose your interface deliberately: IDEs maximize visibility and onboarding; terminals maximize power and background execution. Start interactively, then graduate stable flows to CLI automation.
  • Align latency with task scope: instant completions for inline edits; accept seconds for richer outputs like docstring-driven classes; use explicit agent modes for multi-step execution.
  • Treat environment as a first-class concept: give agents the tools they need (FS, shell, tests, VCS), but enforce guardrails and graduated autonomy so you can scale trust safely.

Resources

Visit resource
Maksim Shaposhnikov
Visit resource
Simon Maple
Visit resource
Tessl

Related episodes

Why 95%

of Agents

Fail

Reuven Cohen
Founder, Agentics Foundation

Can Agentic Engineering Really Deliver Enterprise-Grade Code?

23 Sept 2025

with Reuven Cohen

TEST TO APP
IN MINUTES

Maor Shlomo
Founder, Base44

Can AI Really Build Enterprise-Grade Software?

26 Aug 2025

with Maor Shlomo

AI TO HELP
DEVS 10X?

Patrick Debois
AI Product Engineer, Tessl

Why AI Coding Agents Are Here To Stay

17 Jul 2025

with Patrick Debois

AI-Native Development
Agentic Systems
Code Generation
AI Tools & Assistants

Chapters

Trailer
[00:00:00]
Introduction
[00:00:53]
Deep Dive into Agents 101
[00:01:18]
Understanding AI Entities: Bots, Copilots, Assistants, and Agents
[00:03:56]
Understanding Context Limitations in LLMs
[00:28:17]
Capabilities and Limitations of AI Agents
[00:32:22]
Security, Memory Management, and Future of AI Agents
[00:36:02]

Resources

Visit resource
Maksim Shaposhnikov
Visit resource
Simon Maple
Visit resource
Tessl

Related episodes

Why 95%

of Agents

Fail

Reuven Cohen
Founder, Agentics Foundation

Can Agentic Engineering Really Deliver Enterprise-Grade Code?

23 Sept 2025

with Reuven Cohen

TEST TO APP
IN MINUTES

Maor Shlomo
Founder, Base44

Can AI Really Build Enterprise-Grade Software?

26 Aug 2025

with Maor Shlomo

AI TO HELP
DEVS 10X?

Patrick Debois
AI Product Engineer, Tessl

Why AI Coding Agents Are Here To Stay

17 Jul 2025

with Patrick Debois