Intelligence ≠ Knowledge: Why Context Beats Bigger Models | AI Native Dev

Back to podcasts

Intelligence ≠ Knowledge: Why Context Beats Bigger Models

13 Jan 2026with Guy Podjarny, and Simon Maple

Also available on

Transcript

[00:00:00] Simon Maple: Before we jump into this episode, I wanted to let you know that this podcast is for developers building with AI at the core. So whether that's exploring the latest tools, the workflows, or the best practices, this podcast's for you. A really quick ask. 90% of people who are listening to this haven't yet subscribed.

[00:00:24] Simon Maple: So if this content has helped you build smarter, hit that subscribe button and maybe a like, alright, back to the episode. Hello and welcome, not just to another episode of the AI Native Dev, but also to 2026. Amazing. Insane. Still hard a little bit to rocket it.

[00:00:40] Simon Maple: I'm Simon Maple. I'm one of your hosts of the AI Native Dev.

[00:00:43] Guy Podjarny: And I'm Guy Podjarny. I am the co-host here and CEO and founder of Tessl.

[00:00:48] Simon Maple: Yeah. Amazing. So this is one of the episodes that we do, well, I say every year, we've done-

[00:00:53] Simon Maple: We're actually only 18 months old.

[00:00:54] Guy Podjarny: So it's every year. It's already a tradition. We're officially making it a tradition by making it the second time. It's routine now.

[00:00:59] Simon Maple: So we are going to do a look back on 2025 and a look forward. Let's predict some of the things that are going to happen. We can easily predict AI changes and paths.

[00:01:13] Guy Podjarny: It's like taking candy from a baby. It's so easy to know what's going to come.

[00:01:18] Simon Maple: And we'll prove it just by looking back at some of our 2025 predictions and showing how nailed on those predictions were. A lot happened in 2025. A lot. Do you want some stats about the podcast? We had 53 podcast episodes, which is, well, we do one a week, so that's expected, but it's nice to know that we are hitting that cadence.

[00:01:38] Guy Podjarny: Some proof of the numbers. It's a lot of content; that's many hours of conversations and each of those things has a whole bunch of work ahead of time and then, of course, after for editing and posting.

[00:01:49] Simon Maple: Absolutely. Absolutely. And how many views do you think?

[00:01:52] Simon Maple: On the AI Native Dev YouTube, of course we have the podcast. It's mostly podcast. We have a few other videos, but how many views do you think we had? I know you know, so let's see your acting skills here, Guy. How many views do you think we had? I think we got 87 views. 87? 87. Yeah. I think it's all of your friends, right?

[00:02:10] Simon Maple: Slightly lower. Actually, we had over 1 million views across the year on the AI Native Dev YouTube alone. Which is amazing.

[00:02:24] Guy Podjarny: It's insane. And I think it's a testament to just how engaging this domain is. What we try to do here is try to capture and try to distill and debate the action that is going on.

[00:02:36] Guy Podjarny: So much is happening in the world of AI dev, it's very hard to keep up. Hopefully we pay a little bit of a service to it, but I think the amount of interest, the amount of views, the amount of great feedback we get is a good demonstration that this is a domain where we all need to work together to maintain a grip on the changes.

[00:02:59] Simon Maple: And we'll try and push out as much great content as we absolutely can. We have just over 45,000 subscribers across all platforms. 190 shorts. And then of course, we release these episodes in a number of different ways. Some shorts, some cut-down versions where I think they're about 10 minutes or so.

[00:03:18] Simon Maple: And we get, we get some of the best bits. Uh, our most listened to episode. First of all, who do you think the host was? Guy, you or me? Well, you're far more entertaining. So I would imagine that it's you. But the people didn't go for that, Guy. Your episode with Olivier Pomel, CEO of Datadog.

[00:03:38] Guy Podjarny: It's amazing. It's got so many deep insights. It's so driven, running what is it, like a $40 billion company? That's the right number right now. Insane company. Still so close to the metal and has such great insights.

[00:03:51] Simon Maple: Yeah. Over 50,000 views on that post.

[00:03:54] Simon Maple: Let's go into a few of them. Let's look at some anecdotes and let's dig into some of our favorite episodes, favorite learnings, interesting thought processes that we had. Way back when I was looking at the very first episode we did in 2025, it was Macy Baker. Macy's here at Tessel, working across AI engineering and now the DevRel group.

[00:04:13] Simon Maple: She's a community engineer. It was all around prompt engineering and that was something which felt like such a buzzword back in the day. Now we don't talk about official jobs that have been posted that much.

[00:04:26] Guy Podjarny: First of all, I think it was a really fun episode. I don't know if it was our first episode that was really purely playful, just enjoying a little bit of wonder that is at LLMs.

[00:04:35] Guy Podjarny: We did these deception games. Macy's incredible. But it does feel like in general, you look at 2025 and you look at the beginning of the year and you think about LLMs and prompts.

[00:04:56] Guy Podjarny: It really feels like at least a decade ago in sort of normal time, sort of mental time. What were we thinking a year ago?

[00:05:03] Simon Maple: Yeah. And of course leaning more into agents. Now we lean more into context versus prompts with pure LLMs. We had a couple of really interesting sessions. I had one with Max in the AI engineering team giving us a nice 101 with agents, but also with Uni.

[00:05:25] Simon Maple: Uni was doing a bunch of work looking at the interactions between an agent and an LLM. More like man-in-the-middle looking at the discussions that were going. Very interesting breakdowns on the different tool usage that Claude versus OpenAI agent and also Gemini CLI, of course.

[00:05:50] Simon Maple: Recognising there are significant differences between the number of tools, the tool usage, and also the style of system prompts that each have. Super interesting session essentially contrasting those different approaches.

[00:06:02] Guy Podjarny: Yeah, and I think that one is a recent one, so it's easy to relate to the current interest level, right?

[00:06:09] Guy Podjarny: But I think to an extent, when you look at both the Macy episode way back when and the 101 on agents, it's interesting. There's basically a whole exercise of reverse engineering: how do these creatures behave? It's weird like when you stop to think about AI and development

[00:06:28] Guy Podjarny: Suddenly we need to infer behavior, try to understand the behavior of the tools that we are using, which is new. I think we're getting used to it, but that thread actually continues all the way from understanding how they behave in prompts to how they behave now in agent mode and how they interact with tools. It's still about how do you guide these features?

[00:06:50] Simon Maple: It's so interesting because now we've essentially got two black boxes, right? You've got the black box of the LLM: how is it coming to this answer? How is it providing this response? And you've got this black box of the agent: how is it actually going about trying to get this response?

[00:07:03] Simon Maple: What tools is it going to use? How is it going to use the context? What context should it pull in? When should it do web searches? When should it read the file system? You've got these two layers now, which are quite hard to navigate.

[00:07:16] Guy Podjarny: It's different degrees of freedom. And I think it's important to remember that it's actually always the model. Even the agents, the agents have no intelligence in them. They are just facilitators of combinations. But I think, as Nivo was pointing out in the episode, it's interesting to think about how often are the models tuned to the agents versus how much are they just generic models that the agent is now just an interface to them?, right?

[00:07:40] Guy Podjarny: As Boris says with building Claude Code. In all of those cases, it's just the degrees of freedom. As we try to do more with these agents and with AI, we encounter more of these degrees of freedom.

[00:07:56] Guy Podjarny: With this variability, figuring out how to wrangle that is hard. Before we were trying to find the magic incantations, now we have more tools at our disposal, the literal tools that we give the agents, but also means of managing context

[00:08:15] Guy Podjarny: That's why it went from prompts to context. Prompts are one form of context; there's so much more.

[00:08:22] Simon Maple: Yeah. And we'll jump into agents a little bit more. But let's look at some of the news that happened in and around the topics that we were talking about over the time.

[00:08:33] Guy Podjarny: I think what I loved is a lot of the conversations that we've had over the year have to do with product scope. We're in AI dev; part of it is these models and their brilliance and their not-so-brilliance, how do we deal with that?

[00:08:47] Guy Podjarny: But a lot of it is: what does a dev stack look like? What are the tools? I feel like there were a bunch of things that we foreshadowed a little bit. We had good conversations with some thought leaders. This again, we're sort of here to sort of bring the smart folks over.

[00:09:01] Guy Podjarny: A lot of the conversations and their views have ended up panning out, right. One that really stuck with me is the conversation we had late in the year with Merill from Graphite. Towards the end of the year, they got acquired by Cursor and a lot of my conversation with Merrill was on: if the review agent can automatically identify issues, why wouldn't the agent earlier on just find those problems?

[00:09:24] Guy Podjarny: And if you can modify and fix problems at the review agent right there without going back, isn't that an agent now? And should these two products really be distinct products?

[00:09:40] Guy Podjarny: Merrill had good responses at the time, but it was interesting to see how that conversation clearly panned out. Cursor now acquired Graphite.

[00:09:53] Simon Maple: Oh, I'm sure the folks at Anysphere listen to the podcast. I probably gave them the idea. "What Guy's saying makes sense. Why aren't we doing that?"

[00:09:59] Guy Podjarny: Yeah. It's like "Let's go!" But congrats to Merrill on it. We also had, you had a great conversation with the Slack AI lead.

[00:10:07] Simon Maple: Yes. Yeah, absolutely.

[00:10:08] Simon Maple: Yes, with Samuel Messing. It's super interesting because we were talking a little bit about how agents are being used these days. And of course with the terminal approach of developers interacting with coding agents, it's very interesting that when chat is becoming a far more interesting means of communication, we are not doing this as much.

[00:10:24] Simon Maple: If you look at the terminal UIs in IDEs, there are alternatives like Zed and people like that. When we start anchoring ourselves in this more chat-based system, we have better chat environments than a terminal. Why not Slack?

[00:10:44] Simon Maple: So we discussed the opportunity or the value in, rather than engaging with an AI coding agent in a terminal, why not do it in a place that already has the best environment for discussion, collaborative discussion with channels, et cetera, to happen?

[00:11:02] Simon Maple: Absolutely, why not Slack? We talked a little bit about Devin and how this is really where Devin started. Of course, they added their integrations into Slack. Claude Code now has a Slack integration that was released a few months after we talked about it. What's happening once again?

[00:11:17] Simon Maple: What's happening here, Guy?

[00:11:19] Guy Podjarny: I think you can officially say that this podcast is a mover and shaker stage. It's the only reasonable explanation for why all of these moves happen so shortly after we discussed that.

[00:11:30] Simon Maple: The alternative is we are just stating the obvious, Guy, and I don't even want to entertain that suggestion.

[00:11:34] Simon Maple: Now you're just stating the obvious.

[00:11:38] Guy Podjarny: II do think it's really interesting, Stevie has a mental model to talk about maturity of development or an evolution. It was interesting to think about that Slack conversation in that context. You go from a focus on the code, where you need to be in an environment like the IDE in which you are editing the code, to kind of increasing decoration.

[00:12:01] Guy Podjarny: I liked his view of how the chat window is small and then it gets bigger so that your chat pane in your IDE becomes the primary dominance. You just look at the code and then you go to terminal. It's interesting to think about Slack as a continuation of that.

[00:12:15] Guy Podjarny: It's interesting because that really flips when you think about product scoping. Some IDEs you see Anti-Gravity coming along might be introducing more chat and leaning into a new interface, and others are indeed a chat interface.

[00:12:33] Guy Podjarny: I thought that was really interesting and again, a little bit of foresight here. And then I guess where we did state the obvious a little bit was with Base44. We had Mure on the show; he already built something amazing, a vibe-coding platform, lovable bolt-style.

[00:12:52] Guy Podjarny: In about six months, they got acquired for some silly amount by Wix. We had great conversations. I wanted to test out, he was saying that in six months you'd be able to build Base44 with Base44. I didn't have time to test that out.

[00:13:11] Guy Podjarny: It doesn't sound farfetched at the moment. What we did see in Wix's Q3 earnings is that definitely hasn't slowed down the growth of it. They now have 2 million users served per that announcement, which is about a seven-times increase from the acquisition five months prior.

[00:13:35] Guy Podjarny: They talked about over a thousand paying subscribers added every day. They have numbers that are ridiculous, not quite as ridiculous as numbers you see from, for instance, Lovable, right?

[00:13:49] Guy Podjarny: Going to 200 million ARR, but still, while it was a quick acquisition and an expensive one for Wix, clearly it is panning out in dividends, and it's amazing when you think about it

[00:14:01] Simon Maple: Congrats to Mure. When you think, Base44 was, I think there were six people?

[00:14:07] Simon Maple: No, no.

[00:14:07] Guy Podjarny: Just him. That was the whole notion with Base44; it was basically him. I think about a month before the acquisition they hired one person. Fully bootstrapped and he just built that. I think with today's agents, they're also getting better and better. You might start imagining that being plausible.

[00:14:23] Guy Podjarny: We had another episode, my conversation with Tom Hume, around debunking the myth of the billion-dollar single-person organization. That's probably the closest they got to that. I stand behind what I said in my conversation with Tom: I think you'd be able to have one person produce value that a few years ago would've been worth a billion dollars.

[00:14:46] Guy Podjarny: But because one person can do that, many individuals can do that now.

[00:14:51] Simon Maple: A new baseline, I guess.

[00:14:52] Guy Podjarny: We'll kind of get to this a little bit in the predictions, but I think one of the questions is today there's a lot of value arbitrage by the people that are the fast movers.

[00:15:02] Guy Podjarny: And they come along and they create a Lovable-like system, and now they're building on the brand. And those capabilities, the capabilities themselves that they are building no longer feel quite as differentiated. You can build a Base44 with Base44. It means it's easy to create Base44.

[00:15:17] Guy Podjarny: And so it's really a lot more about that time to market and then can you now conquer it with some form of network effect and brand and just sheer scale economies. Can you build those out? But it's interesting to see how these kind of built up and how the fast movers, we don't hear about the ones that failed.

[00:15:37] Guy Podjarny: But fast movers that succeed managed to rip so much value very quickly. But I still don't think that they're going to be a single person billion-dollar company because others will just mimic that.

[00:15:47] Simon Maple: For our guests as well it's been exciting. You've created a list here

[00:15:56] Guy Podjarny: It's not just a foresight that we had on it. It's also good luck. You should come on the podcast because good things will happen to you. I just created a quick list of it. We had Quinn Slack, the Sourcegraph CEO, come along. We had a lot of great conversations on it.

[00:16:09] Guy Podjarny: They've of course launched AMP as a dev agent, which, you know, in a world in which it's very hard to stand out on it, they really managed to sort of innovate and build a bunch of things that are unique. They’ve since split off the company and, Quinn is now the CEO of Amp and, another team member is running Sourcegraph itself.

[00:16:29] Guy Podjarny: So definitely leaned into it. Matt from 11 Labs was here. That was a really fun conversation. Love Matt. Love what they're building. Interestingly, they've raised twice since that round. I think the last one was $6 billion or maybe a little bit over. Funny thing is, in the conversation, and actually in many conversations I've had with Matti, they had this clarity around being an audio AI company.

[00:16:53] Guy Podjarny: In their last announcements, they now did some video and all that. Again, product scoping. It's all blending. Also a bit of a fan anecdote. We talked a lot about their "No titles" message that at the time felt controversial. We've since embraced that here at Tessel as well. We're having no titles.

[00:17:08] Simon Maple: Yeah. Now, now we're just guy and the minions. We don't say that officially. Oh, sorry, we don't say that officially. Oh, yeah. That was the private doc. That was the, that was the DM that you sent me. That's right, that's right. There's some no value to confidentiality here.

[00:17:20] Guy Podjarny: But no, we had other great ones. We had Victor from Synthesia.

[00:17:23] Guy Podjarny: They've been rumored to raise a round at $4 billion. We had Mattel from Lara building AI security solution. They got acquired by Checkpoint, of course Merrill from Graphite that we mentioned. So, all in all, I think if you're a startup founder, you're building something exciting. You manage to get on the podcast.

[00:17:39] Guy Podjarny: Your odds of being acquired or raising on the round are clearly statistically significant.

[00:17:44] Simon Maple: We should add that as a tagline.

[00:17:47] Guy Podjarny: Charge a fee of it. You can come in, we're going to get 2% carry.

[00:17:52] Simon Maple: Well, it's testament to some of the amazing guests that we've had on board, and that's

[00:17:56] Guy Podjarny: Been very humbling.

[00:17:57] Guy Podjarny: And I, I think, clearly a lot of this is the back channel of it, but I think, first of all, we're very thankful to all of your listeners. You come along, you ask us for topics on it, and we try to debate those and we love the interactions post, but also as our audience grows, we've had a lot of really great inbounds and very smart and capable people wanting to come on, wanting to share their views.

[00:18:20] Guy Podjarny: And that of course creates a bit of a flywheel for the podcast itself. So basically the more you listen, the more you advocate and spread the podcast, the better the guests that we can bring you and have them share their views.

[00:18:32] Simon Maple: Absolutely. And we're always open to suggestions as well.

[00:18:34] Simon Maple: Absolutely. And we're always open to suggestions as well. So do please, if there's someone you feel you'd love for us to chat to and interview and talk through their thoughts, let us know at podcast@tessel.io and we'll work through that. So let's talk a little bit about some of the things that we observed in the market. I guess something happened in the market in 2025.

[00:18:56] Simon Maple: I mean, it was a newsy year. There were a couple, I saw one thing on the Register that happened that we should maybe talk, but I think the big thing, the big shaker, and I guess we noticed this at Tessel obviously when we were adjusting to market changes, was agents.

[00:19:11] Simon Maple: And how much agents really grabbed hold of the interest of every company that is thinking about AI using AI, building with AI. And obviously it also made a change in the way we at Tessel think about our products, think about our positioning in the market. A hundred percent.

[00:19:29] Simon Maple: In terms of the market, let's start with the market.

[00:19:32] Guy Podjarny: Agents are clearly the biggest amidst the many things that happened this year. This was, as predicted a little bit at the beginning of the year, the year of the agent, specifically in the world of development.

[00:19:44] Guy Podjarny: The advent of Claude Code and Sonnet 3.7. I think Claude Code was around for a bit before, but it really started shining when Sonnet 3.7 came out and its capabilities around coding.

[00:20:01] Guy Podjarny: And so I'd say in that April-May timeframe, there was really a corner turned and significant progress made in just how usable development agents are. They existed before; agentic frameworks existed. Devin existed. There were a variety of agents, but it just made it practical and you can really see the turning point in terms of adoption.

[00:20:27] Guy Podjarny: In terms of usage, and in terms of the ability of the likes of us, Tessel and many companies, to now bet on agents.There is a very clear delineation of the before and after. There was a lot of progress leading up to that April-ish timeframe, but from a development perspective, the world has become agentic roughly in that April-May timeframe.

[00:20:55] Simon Maple: And of course, 3.7 was probably one of the first more public, well-known reasoning models or models that provided a reasoning capability. Do you feel like that played a part in the success of agents, or do you think that would've just kind of-

[00:21:09] Guy Podjarny: I think it's a combination of many things.

[00:21:11] Guy Podjarny: OpenAI clearly introduced reasoning with their o1 model. I think Sonnet 3.7 was the first one that had the broader, gradual thinking, how much is your thinking budget? It played a role, but more than that, Anthropic just managed to build a model that was better on the coding front.

[00:21:33] Guy Podjarny: And then to build this interface with Claude Code to really interact with it. One element was just the sheer success of it, and they've been using Claude Code internally before, so they knew that it was successful. Boris, the creator of Claude Code, talks a lot about how for him it was almost by accident that it started writing code; it was just about creating an interface to the model.

[00:21:56] Guy Podjarny: Creating that interface in which you are not focusing on the code, you are in terminal, so the code is secondary, and then having the agentic search, I think that level up, being able to use the local tools in the terminal and be able to search.

[00:22:15] Guy Podjarny: Both of those alongside that core model have combined for something great. It's hard to really pinpoint one thing, but what you can say very clearly is that the interface of Claude Code has tapped into a bunch of those strengths in a very powerful way that just made it more likely to succeed, made it feel the magic.

[00:22:41] Simon Maple: And the UX, as the year's gone on, the more and more that their team has added to those types of things, whether it's hooks, whether it's Claude skills, or the integrations like very interesting browser integrations that were released at the end of last year as well.

[00:22:56] Guy Podjarny: Around that, the two other interesting phenomena that you can say about agents this year is the adoption of MCP and agent skills as standards. MCP technically launched the year before, 2024. But it was really very widely adopted, including by Anthropic's competitors throughout 2025 and fairly early in 2025.

[00:23:16] Guy Podjarny: That was an interesting, non-obvious move by the ecosystem to embrace the protocol. The protocol remains amazing and horrible at the same time. It's a very immature protocol in various ways, but it's also a really great boost to create a standardized way of connecting any software and capabilities into the world.

[00:23:40] Guy Podjarny: And then as that evolved and people used MCPs and our knowledge of agents grew, Anthropic launched Claude skills and then created an open standard version of that as agent skills. It looks like agent skills are definitely initially supported by many other competitive agents.

[00:24:04] Guy Podjarny: That's interesting. Kudos to Anthropic for releasing things that are useful enough that even the competitors understand that it is useful to boost the ecosystem with them. Seeing them do it again and again-

[00:24:21] Simon Maple: Wanting to see the advancements through the standards as well, which is wonderful.

[00:24:24] Simon Maple: Because not everyone does that; most would probably go, "Well, actually let's create proprietary Claude skills and leave it there. Let's not worry about what others are doing." But this is a big useful step for the market.

[00:24:37] Guy Podjarny: I agree and it's interesting to see them be the trendsetter. In general, as you track agents over time, in most cases Claude Code was the leader and setting the trend of how products would work.

[00:24:49] Guy Podjarny: Hooks were initially introduced into Claude Code and then they became prevalent elsewhere. Commands were initially Claude Code and became prevalent. I do want to give a shout out to the Cursor team, because cursor rules have existed for a very long time, and in many ways skills are very much like rules.

[00:25:12] Guy Podjarny: They're a bit more native to the model, so they're different in a variety of ways. But I do think that the idea of structured instructions or reusable context is very much there. So it's been great to see the leadership, see the collaboration, and of course, culminated late in the year by the creating of a Linux Foundation sub-foundation dedicated to AI open infrastructure.

[00:25:35] Guy Podjarny: I'm very curious to see how a Linux Foundation, or an open source foundation, can deal with the pace of change. Foundations and open source pace has not historically been the strongest suit. But I love the contribution, I think skills have been donated,

[00:26:02] Guy Podjarny: Agents.md has been donated over there, the Goose agent is there. So it'll be interesting to see; I hope to see a lot of great things in 2026 around the open agent ecosystem. Yeah, we already had open weight models.

[00:26:23] Guy Podjarny: I think we increasingly have substantial open source software. How do those interact with the commercial players who are raising ridiculous amounts and ridiculous valuations? They do need to build a strong business around them.

[00:26:35] Simon Maple: Collectively, competitors to Claude Code.

[00:26:40] Simon Maple: We should also say Qodo is making big strides as well. Gemini CLI, I hear a ton of people saying how much the competitors have got closer and have improved. People really preferring those different agents as well. So, it's going to be an interesting year and we can maybe talk predictions in a bit.

[00:27:00] Simon Maple: But it's an interesting year to see how agents compete for market share.

[00:27:05] Guy Podjarny: Absolutely. We've seen a lot of progress over there. And I do want to point out that some independent agents came along, where you have Kilo Code and Goose.

[00:27:14] Guy Podjarny: OpenHands dropped a bit off the charter, maybe it's just blended with those others. But you see a lot of open agents that come along. Kilo Code really caught my attention because it's built very much to be a cross-model agent. It's a newer one, so it feels very modern.

[00:27:30] Guy Podjarny: I think, in many of these different tools, each of them learn from one another. And so while Claude set the stage, they're not the only ones that innovated. You know, I think Codex, one thing, for instance, they launched initially in the cloud and then went to the CLI.

[00:27:49] Guy Podjarny: And of course Claude now has cloud versions of it. So people explore different domains and that different paths. I do think that it's sort of a non-controversial statement to say that Anthropic has focused a lot more on sort of the enterprise and are probably the dominant player when it comes to development.

[00:28:07] Guy Podjarny: But in the latest iterations, and we'll talk about that in predictions, the sheer capability of the models in development, there's a lot to say for the Gemini models and areas where they do better. Definitely the gap on OpenAI and other fronts is smaller.

[00:28:25] Simon Maple: More and more as the year went on. When models got dropped, do you feel like that made as big a splash as it did a year ago when a model improvement caused such a big difference with agents? Now do we care more about other things that improve the experience, whether that's tools, the awareness of the project, the context, and things like that?

[00:28:51] Simon Maple: Do we care more about that versus the intelligence of the underlying model?

[00:28:55] Guy Podjarny: It's a good question. I mean, I think, again, one year feels like a lifetime. Yeah. Yeah. And so if you harken back, uh, earlier in the year, there was a lot of conversation about, uh, the rumored failed training runs of. Um, uh, at the time, I think it was Lama four and, you know, whatever the sort of the next big models, uh, of the versions and, uh, the, the growing kind of perspective, uh, in the world of sort of AI research has become that the value is no longer in pre-training.

[00:29:24] Guy Podjarny: It is in post-training. It's in a lot of the refinements that come after and then later on in agentic workflows. Google rocked that boat a little bit with their latest models. In November they released a whole slew of models that were substantially better.

[00:29:42] Guy Podjarny: OpenAI announced the code red about the reach of them. You also see a lot of the dynamics of existing distribution play out. You see Meta AI with their reach, Google with their reach, Grok and Twitter with their reach. The world of research has moved into more post-training and refinement than the pre-training.

[00:30:09] Guy Podjarny: But maybe that got a little bit challenged by the Google releases. If you listen to Ilya Sutskever in the latest episode he had with Andrej Karpathy, he talks about how this is going to be the year of research, coming back to thinking about different, so I'd say this year, probably true that model improvements in the second half of the year were not as dominant.

[00:30:33] Guy Podjarny: OpenAI had a bad year in that sense. It had a lot of incremental improvements, but they didn't have a big splash moment of this model. Maybe Sora is the example, the counter-example. But in the dev world, Anthropic and Google got more of the headlines while OpenAI kept up, but they didn't lead those charts as much.

[00:30:58] Guy Podjarny: I wouldn't rule it out in the long term. That's one thing that is hard for me to predict as research is very hit and miss.

[00:31:08] Simon Maple: of course in the flip side, context and those types of things have made a massive improvement in terms of the way we use agents.

[00:31:15] Guy Podjarny: And I think that to me is a very big takeaway. It's close to heart here at Tessel. What has very much happened post-agents, maybe in the last four or five months, is the acceptance that knowledge and intelligence are not the same. All these models are becoming very intelligent.

[00:31:33] Guy Podjarny: The agents capitalize on that to give them arms and legs, to allow them to go off and invoke tools and gather more information and gather knowledge. But what we also see is that to be a great and powerful developer who is truly AI-native and 100x productive.

[00:31:53] Guy Podjarny: A lot of it comes down to context. A lot of it comes down to you wrote the right instructions, the right summaries, you kept the docs in a certain place. You can pull down the right information at the right time and do context management so you don't overflow the agent. When you give it information, you name things correctly so the agent can find them where relevant.

[00:32:15] Guy Podjarny: You see this in the practices and you see it in the platforms themselves. Gemini has their CLI extensions, but more notably, Claude launched skills. GitHub has Copilot Spaces. Course cursor has always had rules and they launched team rules.

[00:32:33] Guy Podjarny: There are a lot of investments in the platform in reusable context. Maybe the agent is very intelligent and it is becoming more and more intelligent. But when you want it to use a certain library, you don't want it to need to analyze the code of the library. It's better if it has docs that teach it.

[00:32:52] Guy Podjarny: IfIf you want it to code the way you want to develop in a certain organization, it needs to know what the best practices are. It shouldn't just need to infer them dynamically from historical code.

[00:33:16] Guy Podjarny: If you want it to code the way you want to develop in a certain organization, it needs to know what the best practices are. It shouldn't just need to infer them dynamically from historical code. If you hearken back to my conversation with the guy from Augment, a lot of their core competency was around learning from the code base. I think their initial premise was that they will train on the code base and they will understand the code base for them. They built it into RAG, you real-time augment it, but it was all within that intelligence.

[00:33:36] Guy Podjarny: It'll be fetched dynamically and intelligently and magically by the system. You had Poolside last year; they were more training on the code base. Today there's more acceptance that it'll be more like a support knowledge base. Learn from these systems, extract your instructions, and then keep those and monitor and manage those.

[00:33:59] Guy Podjarny: That is very aligned with our view of the world. At Tessel, we talk about spec-centric software, which is increasingly capture intent, capture knowledge, and then provide that in a smart way to the agent and have the agent build on it.

[00:34:21] Guy Podjarny: So that was a positive and maturing step forward in getting agents and LLMs to help us build software.

[00:34:31] Simon Maple: Yeah. And I think from a usage point of view, a lot of frustrations from a developer using agents will very much occur whether it's anything as simple as a style.

[00:34:41] Simon Maple: A way that a developer wants to write code, all the way through to organizational policies or the way this code has been written in my code base today. An LLM couldn't very often provide a correct answer back to a user. But if it's not the correct answer that a user wants to see in a way that they want to see it, it creates frustrations.

[00:35:04] Simon Maple: The value of that agent becomes far less because even if it provides me the right response, now I have to go in and update it and make changes to the answer to allow me to actually be able to check in as is. That frustration with the context, the ability to look at not just how something could be done, but how something should be done, really unlocks the way in which a developer can reduce their frustration and get what they actually want to see.

[00:35:36] Guy Podjarny: I agree with that. I'll further emphasize that the fact that you need to review in the first place is a big limiting factor in how productive you can be. The more you can say upfront and trust that it will be applied, the more you can delegate and the more successfully you can do that.

[00:35:55] Guy Podjarny: We see this context manifest in many ways through the system and maybe it's worth relating it to Tessel. Tessel just as a quick journey, we said spec-centric software will be the future and over time the importance of the code will diminish in favor of captured intent.

[00:36:14] Guy Podjarny: I think that's happening. The world is moving towards that destination. I think the world is moving towards that destination. I think what we see is there are maybe like two versions of that that manifested. One is the advent of spec-driven development

[00:36:30] Guy Podjarny: And so that's more about when the agent approaches a task, not have it before it writes the code, stop and create a plan. And so that has now fairly quickly, as these things happen in this world, has just become a core part of the way the agents build. You know, you see Claude oftentimes defaulting to plan mode and starting by creating a plan. You see Anti-Gravity, which is the Windsurf kind of rebranding or created on the backs, shall we say, of Windsurf post the Google acquisition.

[00:37:00] Guy Podjarny: You see that, creating plans and implementations plan ahead of time. And then you see external frameworks like Spec-it from GitHub and you see of course Qodo that we've had here.

[00:37:15] Guy Podjarny: So you see all of those creating spec driven development. That's more about the mode of interaction with the agent being, "Hey, let's form a plan together. Take everything you understand, here's what I want. Take everything you understood about what I told you and what I want to achieve.

[00:37:33] Guy Podjarny: Take everything you understand about the current reality and then put together a plan and let us review it, and then execute the plan and draw within the lines; stay roughly there." And so I think that has gone from, "Are you kidding me? This is overhead. I don't want to slow down" to "Yeah, of course. This is the way that you build" in the span of three months, something like that.

[00:37:49] Guy Podjarny: And we've very much felt that with the TESSEL framework that we released, which a lot of its goal was to say, "Hey, agent, hold off. Wait, write the spec. Create them," and only

[00:38:08] Simon Maple: Then kind of go off and update. And to remind people, the Tesla framework that you're referring to there.

[00:38:14] Simon Maple: This was not the spec registry that we are very much focused on today. This is more the spec centric way of building software. So this is writing the spec that you then generate that code from using. Right.

[00:38:25] Guy Podjarny: Exactly. And that ended up being kind of the SDD. And so we had a lot of learning on that.

[00:38:30] Guy Podjarny: And I think in many ways this has precisely happened. But it turned out that it really needs to be embedded into the agent. And then the second version of context that came along is more the long lived specs that talk about how you want to build.

[00:38:47] Guy Podjarny: what is my environment, what is the correct way for the agent to produce software. What does good software look like in my environment? Including sometimes product functionality of how things work, whether it's how to use them or to be able to leave a trail behind you of what is the functionality that is left.

[00:39:02] Guy Podjarny: And so all of those now fall not under SDD but rather under context engineering or context management,

[00:39:12] Simon Maple: Which is a massive term that's been repeated again and again and again by the whole market. The importance of context management and context engineering. Absolutely.

[00:39:20] Guy Podjarny: And over there. Context management has its own kind of layer of it for Tessel right now. That's a lot of our focus to say, "Well how do you create the right context in the first place?" So you know, how do you not require the agent to either have all the info internalized in its brain ahead of time, which is clearly not relevant when things are dynamic.

[00:39:41] Guy Podjarny: But also not need to go and read up massive code bases and get them correctly and not make mistakes and read documents about practice. You have to create those documents. Be intentional about what it is that you want. We have a lot of data. We've discussed some of that with these evaluations.

[00:39:59] Guy Podjarny: For instance, in the episode with Yaniv, talking about how to evaluate those. So how do you create evaluation scenarios? And we should probably go back a little bit to the benchmarking. But how do you evaluate? You say words on it and you know, I tell you words all the time and you never listen.

[00:40:15] Guy Podjarny: So agents are sometimes the same. So how do we evaluate and how do you know whether the agents work? How do you know if the new, fancy expensive model will be worth your dollars versus how do you know if this cheaper open model or optimized model, maybe that does the job and it's a lot cheaper, a lot faster.

[00:40:41] Guy Podjarny: So how do you generate that context? How do you evaluate it to know it? How do you distribute it? There's so much to say. How do you get to the agents? And of course, how do you get them to load the right context at the right time? And then how do you observe things over time? So all of that world of context engineering that we've grown to think of as not just context engineering, but rather agent enablement.

[00:41:02] Guy Podjarny: You know, how do you enable the agent with the right information at the right time? And then how do you own this knowledge, this layer of intent, this layer of knowledge, of context? How do you manage that over time? Keep it up to date, change it as models change.

[00:41:23] Guy Podjarny: And so I think this is a brave new world of context that is very, very aligned with our original vision at Tessel of it becoming spec centric, swap the word "spec" for "spec and knowledge and context," these are all synonyms really, and how do you develop in that world?

[00:41:42] Simon Maple: And very interesting that the way we used to talk about prompts. I say a year ago like we're talking a decade ago. But it feels like that when we talk about prompts, where we talk about using the correct language and people thinking about prompt engineers where you need to craft a prompt in a specific way, and then not just holding that to one developer or first of all, understanding what is the right thing.

[00:42:04] Simon Maple: The evaluations are super important for that when we think more on the context side. But then secondly that distribution: when a developer gets that prompt correct. There was the thought of, "Okay, can we," and I know some tools did this as well, "can we share these prompts across so that if I wanted to create a bunch of test cases or if I wanted to do some code reviews, I'm using the correct prompt to guide the LLM in the right way."

[00:42:25] Simon Maple: Now in context, we're seeing all those echoes in the same way. I create context. How do I know that context is well formed, is correctly written so that an agent can best consume it?

[00:42:41] Simon Maple: Evals: the ability to actually eval that context to say "Yes, this is good context" or "No, this context isn't written well. You need to structure it in a different way to get the most out of it from an agent." And then of course that distribution: when we get that context right. If it's a policy thing, well, it's not just me that has to adhere to that policy in an organization, particularly an enterprise.

[00:42:58] Simon Maple: Organizations and enterprises in particular, they have huge numbers of developers, huge teams that need to be distributed to, right? You can't just have one developer or a small subset of developers contributing some code which is either against policy or against the way the rest of the organization wants to build.

[00:43:14] Simon Maple: So all of these challenges just magnify as soon as we think about it in the larger organization and the team development.

[00:43:22] Guy Podjarny: And I think a lot of it, so first of all, I love the analogy to the LLMs. We felt it over here; we built a whole, might I say, impressive evaluation system for prompts.

[00:43:33] Guy Podjarny: And a lot of that needed to be sort of thrown by the wayside, you know, in favor of an evaluation system that is agent based. 'cause that's sort of the right way to interact. A lot of this reverse engineering of "How does it work? How do you affect it?" I think what has changed is the scope of the task.

[00:43:49] Guy Podjarny: In the prompt, you expect a level of accomplishment and the reasoning maybe a little bit more. And then of course you get to the agents of today. So I think we definitely need to build that. But I think a lot of the analogy here is that as the task grows, it becomes like a larger organization.

[00:44:04] Guy Podjarny: When you have a lot of people, they can all be, you hired amazing people, they're very intelligent. But in addition to the, as the scope of the task grows, it becomes a lot more like managing organizations. You have a whole pile of people, right?

[00:44:18] Guy Podjarny: You have a whole pile of people, a group of intelligent people building out, and they still need to be aligned. It's still hard to say, "Okay, we sat down and we discussed the thing and we agreed that we will do it." Do we remember? Do we know to go and read the right document at the right time to perform an action?

[00:44:37] Guy Podjarny: Do we remember our own selves from a year ago, the decision that we made so we can keep to it? And so a lot of these analogies apply now to agents. Sort of like "garbage in, garbage out." If you gave them the wrong instructions, if you gave them irrelevant context, that takes away tokens, that takes away attention.

[00:44:58] Guy Podjarny: And so how do you shrink it, how can you be an efficient communicator? So a lot of those human analogies now occur and some organizational philosophy comes into how do you make agents effective?

[00:45:10] Simon Maple: Yeah. So from the Tessel point of view, and obviously context is a big part of what we are building today and the platform that we're building.

[00:45:19] Simon Maple: One of the terms that I know I've heard every single day pretty much is agent enablement. And this is super important, super powerful. Talk us through, first of all, what is agent enablement and what's Tessel's role in that?

[00:45:32] Guy Podjarny: I feel we should start discussing in outcome based tools versus the technical element. Context is a very important tool or means; context management of it. I think the eventual goal that you want is to enable the agents. And I find sales enablement actually to be an easier example even though in development, clearly, you think about onboarding.

[00:45:57] Guy Podjarny: So for developers, you can think about you need to onboard a developer; so you have a brilliant developer come along, join the team. You still don't expect them to know how to build well within your organization the very first moment they landed. You need them to be onboarded, so what are the materials they need to read?

[00:46:14] Guy Podjarny: And then over time there's some elements of continuous training. I think in development it's a little bit harder to think about systemising it because we rely a lot on individual skills and individual talents and we rely on this institutional knowledge.

[00:46:28] Guy Podjarny: Sales enablement is an interesting analogy because oftentimes as organizations grow, you want your sales organization to be creative only to a degree. You want them to repeat the playbook. And so sales is a much more structured place in which you say, "Okay, here are the enablement materials. Hey, any new salesperson, you need to read this."

[00:46:43] Guy Podjarny: But also, every existing salesperson, every year at kickoff, we're going to teach you: this is the sales process, these are the stages, these are the decks that you need to do, these are the materials, the emphasis, the messaging, all the things that you need to provide.

[00:46:58] Guy Podjarny: And then those get measured and they get monitored and we identify the cases where some random behavior was successful, or when something falls through. And then you redefine a bunch of those practices and then you train them again to build those. And so agents are, I think that's maybe a slightly closer technical analogy to what we do with the agent enablement.

[00:47:17] Guy Podjarny: Your eventual goal is to say, "Hey, I want agents to work in my organizations and specifically in my development and in my code base. I want them to build well for me." So how do I make them successful? What is the knowledge I need to equip them with? How do I evaluate, role play, and test and see that that works, which is something we can't do as well with people.

[00:47:39] Guy Podjarny: We can be a lot more demanding with agents. How do I distribute that knowledge? How do I then observe behavior and see when did it actually happen, and then how do you take that and optimize it? And when we apply it to agents, we see all these super interesting things that we'll continue to share over the next couple of months, all these really interesting observations.

[00:48:00] Guy Podjarny: When we look at agent logs and whether they create, you can see how some instructions you put in your context they just ignore. They just don't do. And you see how other instructions they will do. When we do the AB test, when we evaluate with and without those instructions, they do the same thing anyway.

[00:48:18] Guy Podjarny: So if you try to get it to use Git, it would use Git anyway. But if you try to get it to use a GA instead as the CLI, you really need to say it in a very specific way to counter the intuitive way. And so if you said Git, you wasted context. If you said GA in the wrong way, you're not going to get your results.

[00:48:36] Guy Podjarny: So a lot of, a lot of those, and eventually you, like, you have to think about how do I solve it? I've observed a problem. And that comes back to you modify the context. That's your tool. So we, we think a lot about if you are an organization, you want agents to work for you, you need to enable those agents and context and the management of that context, the optimization of that context is the means through which you do it.

[00:48:58] Guy Podjarny: And then I want to say maybe one thing, the other term that I harp on a little bit recently is agent experience. I think from an organization perspective, what you're doing is your agent enablement in your organization. But say you're an open source maintainer and you have your library; you want agents to be able to successfully use it.

[00:49:16] Guy Podjarny: And you built some library or some framework on it. You want developers to use it. And so you invest in developer experience. You care about your command line hygiene. You care about ease of installation.

[00:49:32] Guy Podjarny: You care about self-healing; whether there's a problem, you care about all these traits: composability, all these great developer experience traits. Why do you care? It's because you want people to use this stuff you've built, right? They're amazing. You want them to be accessible, you want them to be used correctly so they don't come bother you with silly mistakes.

[00:49:50] Guy Podjarny: So with agents, you have the same need. You want agents to use your software. You want them to use your tools, you want them to use them correctly. And context is oftentimes the means that you have; it is both your UX and your docs at the same time. And so the other lens of enabling agents is a bit more focused on: for my piece of software, for my API, whether I'm an organization or an open source maintainer, whatever it is, for my open source library or a closed source library, I want to provide a great agent experience, which is basically a form of: I want to enable agents to be able to consume my work.

[00:50:32] Guy Podjarny: All of that combines in my, granted, I'm deep in this Kool-Aid view, into this new reality of software development that is spec-centric or intent-centric versus code-centric, in which what you're capturing and what you are communicating is how to do things. It's what to do and it's not the implementation and the code.

[00:51:05] Simon Maple: It's going to be a big year for context. It's going to be a big year for enablement and a big year for Tessl.

[00:51:12] Guy Podjarny: Indeed.

[00:51:12] Simon Maple: Tessl will be super fun to build and.

[00:51:15] Guy Podjarny: Super fun to

[00:51:15] Simon Maple: Use.

[00:51:16] Simon Maple: Yeah. And hopefully a massive success. Amazing. Amazing. Okay, so to wrap up this episode, we are going to finish with not just the predictions of 2026, but let's see how favorable or how well our predictions of 2025 went. How poorly did they do? How poorly they aged in 2025. So we'll look at three categories and we'll jump through these quickly.

[00:51:36] Simon Maple: The first category was actually my prediction, which is: we'll look beyond Day 1 development in 2025, whereby in 2024 we were very much focused on code generation and throwing out, spitting out code left, right, and center.

[00:51:53] Simon Maple: I predicted we're going to actually think much more about the part beyond Day 0, beyond Day 1, and so forth. Can we look more into the testing? Can we look more into the maintenance and stuff like that? Did that happen?

[00:52:06] Guy Podjarny: Uh, not so much. Not so much. Not so much. Well, I think like two things happen, right?

[00:52:10] Guy Podjarny: Not so much. Not so much. Not so much. Well, I think two things happen. I sort of give you partial marks on that. Because I feel most of what grew was more about what we can do with the agents. So can I do initial things bigger? I do think that there's this advent of an AI-native developer that works in these context-rich ways and focuses on delegation.

[00:52:29] Guy Podjarny: And for them, they do think a bit about the longevity because they would store artifacts. So I think they get some of those points, but I think it hasn't reached teams. It's still in single player. And it hasn't reached quality per se. It's purely functionality.

[00:52:45] Simon Maple: Yeah. I would agree with that. And I think there's a lot of work that's done in the reviewing of code that's being generated, but it's still very much Day 0. It's still in the pull request of the piece of code that's being delivered, rather than the Day 1 “this already exists, how do we maintain a hundred percent?”

[00:52:58] Simon Maple: The second category was: I guess we were looking at models back then. So we were looking at model leaders. Would there be a sole model leader, or would it still be a group of people jumping past each other for a brief period?

[00:53:18] Guy Podjarny: Yeah. I think I said at the time that I don't think there's going to be a leader, and we talked a lot about fragmentation. Like: would an organization pick a model at the time and stick to it? Would one of them break out? It says there's no way they're going to come and pass one another.

[00:53:34] Guy Podjarny: There will be more of that continuation that we've seen in 2024. And I think that happened.

[00:53:39] Simon Maple: I'd say it has. I think people are maybe picking between models, which is another thing that we predicted in terms of what's their need. Do they need more of a reasoning model for this type of task or whatever? But in terms of the vendors, people are jumping between and, depending on which month you're at, you'll see a different vendor at the top.

[00:53:59] Simon Maple: And I think they're keeping up with each other, which is a good thing. It's a great thing for the-

[00:54:03] Guy Podjarny: I think a lot of the beating one another is happening. I just give a few concrete examples: Gemini clearly, when it came out with its latest models in November, that definitely showcased suddenly it is at the top of all the different benchmarks.

[00:54:23] Guy Podjarny: I think the other interesting play was Cursor launching its own models in Composer, which are faster, generally perceived as not maybe quite as good at some things, but for specifically others, they might be very good and fast, and maybe faster than the others.

[00:54:42] Guy Podjarny: So I think that played out. And I think there's still been edging one another. And I think also the gaps have shrunk. There has been a lot of, there's almost a taste element; people start developing, "I kind of like this model, I like their style."

[00:55:01] Simon Maple: Befriending the model.

[00:55:02] Guy Podjarny: Befriending the model. But I think loyalty remains low, and most organizations remain multi-tool multi-model in generally when you talk to them. We talk to a lot of enterprises. They don't see that changing.

[00:55:15] Simon Maple: I agree. The third part, obvious part of GEM in this one, when we're talking about adoption of AI in 2025 around tools, but as well as enterprise adoption. Cursor grew, other high adopter tools continuing to build in popularity generally.

[00:55:36] Simon Maple: We saw that enterprise adoption more being on AI growth and enterprises continuing to invest. Absolutely. We see that. And actually at a number of conferences this year, I'm amazed at how many enterprises are really betting on AI and it's not normal technologies. You see the startups and the smaller companies betting big and then the enterprises are slow to come in. We're not seeing that the adoption is a strong hiccup.

[00:56:04] Guy Podjarny: Where there was that study that came out that said that most projects of AI fail in enterprises. And there was a moment of concern that everybody would pull back. And there's definitely still the financial market concerns about the AI bubble. But all in all, I'd say we made two somewhat obvious predictions on it.

[00:56:27] Simon Maple: But the GEM is the more crowdsourced benchmarks by mid-2025. And that was very close to what actually happened.

[00:56:34] Simon Maple: Of course, with SWE-bench still being obviously very much used, but TBE coming around and a number of other evaluations, I know Spring had their own benchmark for Spring applications and of course Vercel came out with their own benchmarks for Next .js applications.

[00:56:51] Simon Maple: This is something which people are realising there is a need for, and it is being crowdsourced.

[00:56:58] Guy Podjarny: I agree. Hopefully one a little bit less obvious, but it comes back to: we have to have a measuring stick, you know?

[00:57:05] Guy Podjarny: We have to be able to deal with them, and as we do more and more, we need more and more benchmarks. So there's probably this year, we mentioned a handful over here, but I think the idea of releasing a benchmark became a very fashionable thing. So many benchmarks around.

[00:57:18] Guy Podjarny: And I think that's great. So much so that there's need for tooling like TBE; tooling to create benchmarks. We're big fans of TBE, harbor that we use. On top of that, there's the newer version of it. So evals are growing and I saw one post from someone talking about how developers will become data scientists.

[00:57:39] Guy Podjarny: I don't love that analogy, but I think there's a grain of truth in there, which is you have to embrace this in this world. Your work, you're producing software through instructions to the agent. And in that world, you don't have determinism. You cannot say, "This works and this doesn't work." What you need is statistics.

[00:57:58] Guy Podjarny: And evals give you that means. And so I'll say that will continue. And the competency of running an eval, defining an eval, executing an eval will be one that will grow both in individual developers and in development organizations. They will need to have their own benchmarks for whether an agent is worthy, capable of performing actions in their environment, in their development.

[00:58:17] Guy Podjarny: You know, they will need to have their own benchmarks for whether an agent is worthy and capable of performing, you know, actions

[00:58:27] Simon Maple: in their, in their environment, in their development. And of course, one of the reasons why we are building evals as a first class citizen into, into the, the, the context management platform that we're building.

[00:58:36] Guy Podjarny: I think eventually every developing organization needs to have benchmarks to say, "Can this agent work in my environment?"

[00:58:42] Simon Maple: Yep. And of course when there are changes or when you want to compare whether agents or context, you have the ability and the means to do it. Indeed. So let's look forward to 2026. Given 2026, given how we'll continue our fools errand over here-

[00:58:58] Guy Podjarny: I think we weren't overly audacious in our predictions in 2025.

[00:59:03] Simon Maple: Should we see if there are any audacious predictions for 2026? We'll see.

[00:59:06] Guy Podjarny: We're pretty reasonable people.

[00:59:07] Simon Maple: I think from my point of view, I'm going to stay away from the Day 1 development.

[00:59:12] Simon Maple: That's a “me” problem, I think. So let's talk about agent usage. How people are using agents in 2025. Of course, everyone's jumping on the agent train. And it's a case of: can we pull this into our organizations? Can we get adoption? And people are very much trying to work out what can we do with agents, how can we pull agents in?

[00:59:32] Simon Maple: I think in 2026 we're going to go to that next level, which is: okay, we want to be using agents. We're trying to adopt agents, we're trying to bring it in. That's a given. How do we actually make sure we are effective at using agents? How do we actually tune the way we are using agents? And I think a lot of this is going to be the means to which you can provide an agent things like context, but it's also the way in which we learn to use agents and there are ways in which we can improve that.

[00:59:58] Simon Maple: Maybe it's through the means in which we parallelize agents. Maybe we break down tasks and have different agents running those. So I think there's going to be a mechanism in which we need to learn how to more effectively use agents. But there's also, like we say with the agent enablement and the drive to use context better to get the most out of agents, all of that together will hopefully establish a way that we can measure how effective, how productive we are with agents and start to talk with real data, real numbers about the ROI of us using agents, not just individually but as an organization as well.

[01:00:36] Simon Maple: The pragmatism. I don't know if we'll get there or not in 2026, but our mindset will shift to asking those questions and performing tests that will try and give us that data.

[01:00:49] Guy Podjarny: Yeah, I think that's a sensible prediction.

[01:00:51] Guy Podjarny: That's a sensible prediction. Would you say, do you think organizations that are large enough will focus on getting a portion of the team to be five stars and how they use agents more? Or would they focus more on broadening and having everybody in the team use agents?

[01:01:08] Simon Maple: It's a really good question.

[01:01:09] Simon Maple: I think there's always this myth when we think about things like maturity, that when we look at a large organization, we think, "What's the maturity of this large organization?" And it doesn't work like that because every team is at a different maturity. Every team is at a different level of adoption.

[01:01:24] Simon Maple: And I think what we would need to do is recognize at what level each of these teams are at. Some will still be in that 2025 model of "We just need to get people using it. We need to get it into people's hands." Others will be much further ahead where they'll be actually: yes, these are teams that we can actually really understand what they're doing, how much it's affecting them, and then take those lessons to spread across those various groups.

[01:01:48] Simon Maple: So I think there'll be a mix, to be honest.

[01:01:50] Guy Podjarny: On those. It's interesting. And I think that's very cloud adoption style or DevOps adoption style, which is you have to have some forerunners. And I think once you demonstrate something is successful, then you start getting excited about rolling it out to the team.

[01:02:05] Simon Maple: Yeah. And I think it's equally what we saw at Snyk when we think about developer adoption of security. Secure development. It's something that individuals have that passion for, or individual teams will do more successfully, and those learnings can be shared across to other teams through the value, I guess.

[01:02:22] Simon Maple: But my other, I guess when we think about last year we were talking about model winners. This year we're actually talking about agents and will there be a clear standout agent winner? I think Claude Code today has been one of the most successful, possibly one of the leading agentic coding tools for a developer to use.

[01:02:42] Simon Maple: I think we'll see other agents get closer and you'll probably start seeing more of that "one model replacement, one agent jumping another," and hopefully, because it's important for competition, that we will see that swapping and changing of "this new vendor or existing vendor has just added these new features and they're actually more performant than these others."

[01:03:05] Simon Maple: I'd like to think there are going to be some new vendors that we haven't seen today, but we'll see.

[01:03:11] Guy Podjarny: I had a similar one in my queue, which was: I think there's going to be a bit of a separation between the agent and the model. And so, kind of like when you're using an IDE but you use the same IDE for developing in different languages, do you, what would be the dominant version of it?

[01:03:30] Guy Podjarny: Would you use Claude Code? Today many of the agents support many models. You can use Claude with a non-Anthropic model. But it sort of feels like my intuition would be Claude would be best with the Claude models, with the Anthropic models.

[01:03:46] Guy Podjarny: And I think that's true for all the assumptions. Would there be independent agents that work well? And so it's interesting. I would predict that in 2026 there will be an increased adoption or the rise of agents that are separate from model companies from it.

[01:04:08] Guy Podjarny: Or maybe an open sourcing that gets contributed. I guess Goose is one example of that. So I would still predict that the majority of users will use things that just come bundled in with model and agent, all in one, but that we will have some sweetheart agents that get adoption that are separate.

[01:04:30] Simon Maple: We'll see. We'll see how it goes. I think that's absolutely right. Today we look at an agent and we just think the agent is my AI tool, very similar to an LLM. A lot of people that I've spoken with sometimes the lines are a little bit blurry between an agent and,

[01:04:45] Guy Podjarny: And there is, we should know that there are a bunch of tools today that are kind of agent orchestrators.

[01:04:50] Guy Podjarny: And they work at high levels. So you have companies like Warp Dev in the terminal or there are a bunch of agents today that are not from the model companies. So we'll see how they fare. I would add to that, I think your comment on becoming effective resonates with me.

[01:05:10] Guy Podjarny: I'd be more specific and I'd say there will be a whole wave of best practices and tools and such to go from single player to multiplayer. Because I think today still so much development happens, the AI native developers, they're solos, they're solo developers maybe at most two.

[01:05:35] Guy Podjarny: They work with a lot of alignment. And I think in parts that's the magic. If you think of them as managers now because they have teams that they operate and so they can manage multiples, but we still need to think about organizational alignment or there still needs to be dependencies between one system and another.

[01:05:53] Guy Podjarny: And so I think this notion of single player to multiplayer, thinking about dependencies, thinking about commitments, how do we interact with those and how do we still maintain the AI native future? I think we're going to see a lot of evolution over there. And some of that will come to the models.

[01:06:08] Guy Podjarny: I think the models might introduce things like committed behavior, maybe introduce more harder tests that are built in. Which again, you can technically do today, but oftentimes the agents just modify the tests. So practices around it. I think we're going to see big improvement over there.

[01:06:28] Guy Podjarny: Talked about the agent separation from the model. Clearly, I think agent enablement capabilities will come along with that as well. And I guess the one that I would say that's a bit more on the limb is I think we'll see much more adoption of open models.

[01:06:43] Guy Podjarny: I feel like what has happened during this year, towards the end of the year, is there are state-of-the-art models, but the pace of change is reducing; it is shrinking. And I think there have been more and more really good open models and as enterprises and organizations as a whole start being more successful with agents, they will start spending a lot of money.

[01:07:03] Guy Podjarny: The immediate spend might be very worthwhile because it comes at the expense of human labor, which is even more expensive, but it's still big numbers. And so I think people will start getting to really care about, "Well, do I need Opus for this, or even Sonnet, or can I just use this much, much, much cheaper open model to run?"

[01:07:25] Guy Podjarny: I think NVIDIA's acquisition of Grok is slightly in that domain as well, to see that trend happening. And so I would predict that in 2026 we'll see much more substantial use of open models, especially in development and in recurring activities.

[01:07:50] Guy Podjarny: When you're in your desktop with a model you want maximum power because you want to work them. But when they are repeated tasks that you can evaluate, you can get a sense for how good would an open model be at that.

[01:08:04] Guy Podjarny: That comes back a little bit to that sense that benchmarks will continue. And people will have their evaluations.

[01:08:09] Simon Maple: And also the focus on effectiveness and the ROI of what we are doing and what we're using.

[01:08:14] Guy Podjarny: and I, and I, I think we will sort of see a whole like one sort of stream that's probably gonna stay the minority of people that are fully open, right? Yeah.

[01:08:22] Guy Podjarny: And I think we will see a whole stream that's probably going to stay the minority of people that are fully open. They're using an open agent with an open model and they're running it on hosted servers maybe, but on their control servers. And they run with those, because I think at this point a lot of the accomplishments, a lot of the means of success is the methodology of how do you use the agent, how do you give it the right context at the right time. How do you make all those interactions?

[01:08:42] Guy Podjarny: And I think a lot of that know-how, the intelligence that is in the open models, will be enough to do a lot of that work.

[01:08:54] Simon Maple: Yeah. Very interesting. And we'll chat about this in 2027, which sounds weird to say.

[01:09:01] Simon Maple: So we have to get used to that.

[01:09:02] Guy Podjarny: Barely 2026, hold on there Simon.

[01:09:06] Simon Maple: Yeah, sorry. One year at a time. One year at a time. We'll see how inaccurate, sorry, accurate, we are with those predictions in a year's time. Cool. We'll still be here with hopefully some amazing speakers and content every week.

[01:09:19] Guy Podjarny: We have a few we already know of.

[01:09:20] Simon Maple: We have, you've already started with a couple.

[01:09:23] Guy Podjarny: The beauty of recorded podcast. Yes. We actually have a couple of episodes coming your way. One with the previous CEO of GitHub, Thomas Dohmke. And he, we had a great conversation talking about a variety of building AI dev tools and perspectives on it.

[01:09:37] Guy Podjarny: And why in the world would he go off and build another AI dev startup on it. So that was a fun conversation. And then Mikko, who is the CEO and founder of Dash0, which is a really interesting company in the observability space, talking a lot about agents in that world.

[01:09:55] Guy Podjarny: What is AI native for observability and how do we bring in that power of intelligence as well as what context is needed to the world of operations, which is often a bit more risk aware, shall we say.

[01:10:12] Guy Podjarny: So super interesting conversations and just the harbingers of a super interesting year with, I know it's probably going to be a fairly tame year, a quiet year.

[01:10:27] Simon Maple: I expect so. Quiet, slow moving. I think it's going to make 2025 feel quiet.

[01:10:31] Simon Maple: So, amazing. Well, all the best for 2026 to everyone. A happy New Year.

[01:10:39] Guy Podjarny: And thank you for listening and following this. And again, keep the feedback flowing.

[01:10:44] Simon Maple: Absolutely. Thanks for listening. And tune into the next episode.

AI-Native Development

Agentic Systems

Workflow Automation

Chapters

Introduction

[00:01:08]

Deep Dive into AI Agents and Their Impact

[00:04:55]

Developer Frustrations with Agents

[00:35:31]

Context Management and Agent Enablement

[00:39:36]

Predictions for AI and Agents in 2026

[00:52:19]

Looking Forward to 2026

[00:59:50]

In this episode

In this New Year kickoff episode, host Simon Maple and guest Guy Podjarny delve into the transformative developer-facing shifts of 2025 and their implications for AI development in 2026. They explore the evolution from prompt-centric approaches to agentic systems, emphasising the importance of designing AI-native dev experiences that manage entire pipelines and focus on observability, structured contexts, and chat-first workflows. Key takeaways include the need for end-to-end instrumentation of agents, the benefits of specialised, well-specified tool graphs, and the merging of code generation and review into a cohesive feedback loop.

A whirlwind year turned “prompt engineering” from a party trick into a discipline of context design and agent orchestration. In this New Year kickoff, host Simon Maple is joined by co-host and guest Guy Podjarny (CEO/founder of Tessel) to look back on the biggest developer-facing shifts of 2025 and what they mean for building with AI in 2026. From agent observability and chat-first workflows to the convergence of code-gen and code-review, this episode distills patterns, pitfalls, and product bets developers can apply right now.

2025 In Review: From Prompt Tricks to Agentic Systems

The show itself mirrored the industry’s acceleration: 53 episodes, over 1 million YouTube views, 45,000+ subscribers across platforms, and 190 shorts. The most-watched episode featured Datadog’s Olivier Pomel—proof that leaders at scale are both shaping and responding to AI-driven development. But the bigger story is the thematic shift: early 2025 was still prompt-centric; by year’s end, teams were shipping agentic workflows, tool-augmented models, and context pipelines as first-class infrastructure.

An early episode with Macy Baker captured the wonder and fragility of prompts—deception games, clever incantations, and model-specific quirks. By contrast, late-year discussions focused on structured agent behaviors, context hygiene, and tool use. That evolution marks a maturation: developers stopped optimising for “the right words” and began designing systems that consistently guide models toward the right decisions.

The takeaway: prompts now play a supporting role in a broader system. Building an AI-native dev experience means owning the end-to-end pipeline—retrieval, tools, memory, and evaluation—and accepting you’re managing probabilities and policies, not just strings.

Designing for Two Black Boxes: Models, Agents, and Observability

A central theme this year: developers are now wrangling two black boxes—the LLM and the agent. The LLM’s reasoning is opaque, and the agent’s plan/act loops add another layer of uncertainty: when to search, what files to read, which tools to call, in what order, and how aggressively to iterate. Episodes with Uni (man-in-the-middle tracing) and Max (agents 101) stressed that developers need to reverse-engineer behavior using telemetry, not intuition.

Actionable practices:

- Instrument every turn. Persist a full trace: system prompts, user messages, tool calls, tool results, and model deltas. Label each step with timestamps, token counts, costs, and statuses (success/failure/retry). You can’t improve what you don’t see.

- Constrain degrees of freedom. Use a whitelisted tool registry with typed, schema-rich signatures; define preconditions and cost/time budgets per tool; set retry limits and backoff policies. Fewer, higher-quality tools beat a sprawling toolbox.

- Shape context deliberately. Build code-aware context (symbol graphs, repo maps, embeddings) and prioritise relevant chunks. Order matters: source-of-truth docs before user chat, current branch diffs before historic files, etc.

- Make system prompts policy-heavy, not hacky. Instead of “be helpful,” specify objective functions (e.g., reduce diff size, minimise tool calls), selection heuristics (when to search vs. read local files), and escalation policies (when to ask the user).

- Evaluate like a product team. Track task success rate, average steps-to-success, tool-call accuracy, hallucinated tool invocations, and cost-per-solved task. Maintain regression suites of realistic tasks to compare providers, prompts, and tool sets.

Uni’s comparative analysis of Claude, OpenAI, and Gemini CLI showed that tool usage frequency, system prompt styles, and planning aggressiveness vary materially across providers. Benchmark your workflows against multiple models; your ideal provider often depends on your tool graph and repo structure, not brand halo.

Chat-First Development: From IDE Panes to Slack-Native Agents

A standout conversation with Slack’s Samuel Messing underscored a usability truth: if development is increasingly collaborative and multi-turn, chat is a natural interface. The industry started to lean into that—Devin launched with Slack integrations; Claude Code added a Slack app; and many teams now prefer rich chat UIs over terminal-bound “agent shells.”

Design patterns for Slack-native agents:

- Threads as tasks. Each thread represents a scoped mission with its own context and memory, making it easy to revisit, summarise, or hand off.

- Slash commands as typed tools. Expose durable capabilities (/plan, /diff, /test, /deploy) that map 1:1 to your agent tools and accept structured JSON arguments.

- Code-aware attachments. Let users drop files, gists, or PR links; your agent ingests them into a task-specific context index and calls code intelligence tools (symbol graphs, LSPs) behind the scenes.

- Ephemeral sandboxes. For safety and speed, spawn short-lived environments where the agent can clone, build, run tests, and produce artifacts. Attach logs and previews back into the thread.

- Identity and permissions. Map Slack users and channels to repo permissions, secrets scopes, and deployment rights. Every tool call should include the acting identity and an auditable reason.

- Bridge to IDEs. Offer “Apply patch in repo” deep-links that open a PR or a local workspace change. Chat is for intent and iteration; the IDE remains best for inspection and fine edits.

Stevie’s “UI evolution” framing—chat panes growing from side panels to primary canvases—suggests a boundary shift: code remains source-of-truth, but the main control plane is moving into collaborative chat. If your agent UX still assumes a solo terminal user, you’re likely leaving adoption on the table.

Code Review Meets Code Generation: The Cursor–Graphite Convergence

Podjarny called it early: if a review agent can reliably find issues and propose fixes, why wait until review? The logic—and later, Cursor’s acquisition of Graphite—signals a convergence: code generation, review, and fix-application are becoming a single feedback loop.

How to build for this convergence:

- Review-in-place autofix. When the review agent flags issues, let it propose minimal diffs, run lints/tests locally, and push commits back to the PR with an “Explain your fix” note.

- Policy-driven gates. Use risk scoring (blast radius, dependency changes, security flags) to decide which fixes can auto-apply vs. require human approval. Start with lint/doc/test-only fixes.

- Test Impact Analysis. When agents propose diffs, run only impacted tests first to reduce latency. Escalate to full suites on success or high-risk changes.

- Continuous critique. Let the same static analysis, security scanning, and style checks that power review also inform earlier code-gen steps. Fewer surprises at review time.

- Metrics that matter. Track review cycle time, diff acceptance rate, rework rate, flaky test incidence, and “agent-added defects.” Use these to tune when the agent asks for help vs. pushes ahead.

Expect 2026 tooling to ship “closed-loop PRs” where an agent plans, codes, self-reviews, self-fixes, and presents a ready-to-merge change with a crisp audit trail. Teams that wire CI/CD and review automation now will be first to benefit.

Solo Builders, Agent Teams, and Product Scope: Lessons from Base44

The Base44 story—largely built by Mure, quickly acquired by Wix, then scaled to millions of users and thousands of daily payers—showed what’s now possible with agentic platforms. Meanwhile, Lovable’s hypergrowth and Tom Hume’s caution about the “single-person unicorn” myth framed a nuanced reality: agents can compress headcount, but not eliminate product scope decisions, quality bars, or go-to-market work.

What developers can emulate:

- Build-with-your-tool. Dogfood relentlessly. If your agents can spec, scaffold, code, and iterate your own product, you’ll converge on the right tool graph and observability fast.

- Choose a narrow, high-context vertical. General agents sprawl and drift; vertical agents exploit structured context (domain schemas, workflows, compliance rules) to deliver reliability.

- Treat agents as a workforce. Model your pipeline explicitly: clarify > plan > retrieve > act > validate > summarise. Use typed tools, budgets, and SLAs for each step.

- Engineer for cost and speed. Track cost-per-success, cold/warm start times, and tool-call hit rates. Introduce caching (embeddings, search results), and prefer local analysis over web search when possible.

- Design upgrade agility. Keep model/provider abstraction layers thin and swappable. Different tasks (planning vs. refactoring vs. doc generation) may benefit from different providers.

The headline isn’t “solo forever,” it’s “ship with a tiny team and an agent workforce.” The teams that scope well, instrument deeply, and iterate fast will outpace bigger orgs still arguing over prompt styles.

Key Takeaways

- Instrument your agents end-to-end. Persist turn-by-turn traces, tool calls, costs, and outcomes. Observability is how you tame both black boxes.

- Constrain and type your tools. A smaller, well-specified tool graph beats a sprawling, ambiguous one. Encode preconditions, budgets, and fallback policies.

- Make chat the control plane. Slack-native workflows (threads-as-tasks, slash-command tools, ephemeral sandboxes) unlock collaboration and adoption far beyond terminal UIs.

- Unify code-gen and review. Let the review agent auto-fix the issues it finds under clear policies. Measure diff acceptance, time-to-merge, and agent-added defects to guide automation levels.

- Dogfood and specialise. Build your product with your own agent stack, pick a tight vertical, and focus on reliability via domain context—not generic breadth.

- Optimise for provider fit, not brand. Claude, OpenAI, and Gemini differ materially in tool behavior and system prompt patterns. Benchmark your workflows and mix providers by task.

Related episodes

Agents Explained:

Beginner To Pro

Maksim Shaposhnikov

AI Research Engineer, Tessl

AI Agents Beyond Context Limits

28 Oct 2025

with Maksim Shaposhnikov

What Developers Need To Know About Agents Before 2026

30 Dec 2025

with Maor Shlomo, Reuven Cohen, Maksim Shaposhnikov

Smaller Context,

Bigger Impact

Founder & CEO, Tessl

What Holds Devs Back From Multi-Agent Thinking

26 Nov 2025

with Guy Podjarny

AI-Native Development

Agentic Systems

Workflow Automation

Chapters

Introduction

[00:01:08]

Deep Dive into AI Agents and Their Impact

[00:04:55]

Developer Frustrations with Agents

[00:35:31]

Context Management and Agent Enablement

[00:39:36]

Predictions for AI and Agents in 2026

[00:52:19]

Looking Forward to 2026

[00:59:50]

Related episodes

Agents Explained:

Beginner To Pro

Maksim Shaposhnikov

AI Research Engineer, Tessl

AI Agents Beyond Context Limits

28 Oct 2025

with Maksim Shaposhnikov

What Developers Need To Know About Agents Before 2026

30 Dec 2025

with Maor Shlomo, Reuven Cohen, Maksim Shaposhnikov

Smaller Context,

Bigger Impact

Founder & CEO, Tessl

What Holds Devs Back From Multi-Agent Thinking

26 Nov 2025

with Guy Podjarny