Podcast

Podcast

February Roundup: AI Model Wars, and the Future of AI Dev Tools

With

Guy Podjarny & Simon Maple

10 Mar 2025

February AI Insights with Guy Podjarny and Simon Maple

In February's edition of the AI Native Dev Monthly Update, Guy Podjarny and Simon Maple engage in a thought-provoking discussion about the latest AI developments, focusing on significant model updates and their real-world applications. The episode delves into the competitive pressures driving frequent AI model releases, highlighting the need for companies to stay ahead in the ever-evolving AI landscape. Special guest Mati from ElevenLabs shares valuable insights on entrepreneurship in AI, emphasizing the importance of transparency and innovative organizational structures. Whether you're a developer or an AI enthusiast, this episode offers a comprehensive view of the current AI trends and their impact on the industry.

Overview

Overview of Model Updates: GPT-4.5 and Sonnet 3.7

The month of February was marked by significant updates in the AI model landscape, with the release of GPT-4.5 and Sonnet 3.7. Guy Podjarny and Simon Maple introduced these updates, highlighting the notable features and improvements. GPT-4.5 has garnered attention for its enhanced emotional intelligence (EQ), which aims to provide more compassionate interactions, as illustrated by its improved response to empathetic prompts. On the other hand, Sonnet 3.7 introduced dynamic reasoning capabilities, making it the first hybrid reasoning model on the market. This development allows the model to adjust its depth of reasoning based on the complexity of the task.

The Competitive Pressure in AI Model Releases

The competitive dynamics among AI labs have been intensifying, as discussed by Guy and Simon. The frequent model updates are often driven by the need to capture media attention and maintain user interest. This environment creates immense pressure on companies to release updates in tandem with their competitors. Guy noted, "you don't want to let any other player, if you're one of these big players, dominate the news for any window of time." This competitive atmosphere influences not only the timing but also the nature of model releases, as labs strive to outpace each other in innovation and market share.

Deep Dive into GPT-4.5

GPT-4.5 has been a focal point due to its emphasis on emotional intelligence. This update offers a more human-like interaction style, addressing user feedback that favored models with higher EQ. OpenAI's internal tests suggested a user preference for GPT-4.5 over its predecessor, GPT-4.0, particularly in categories like everyday queries and professional interactions. However, external surveys, such as those mentioned by Guy, presented mixed results, with some users favoring the older model. Guy described this as "a little bit underwhelming," suggesting that while improvements were made, they did not meet the high expectations set by the community.

Evaluating AI Models at Tessl

Tessl employs a rigorous evaluation process for integrating new AI models into their products. Guy shared their experience with GPT-4.5, noting its underperformance in code generation tasks compared to GPT-4.0. He explained, "the results were actually not exciting at all... it actually did worse than 4.0." This highlights the importance of thorough testing in real-world application scenarios, as advancements in certain features may not translate to all use cases.

Exploring Sonnet 3.7’s Dynamic Reasoning

Sonnet 3.7's introduction of dynamic reasoning represents a significant advancement in AI capabilities. This hybrid reasoning model allows users to dictate the level of reasoning required for a task, offering flexibility in processing. Guy compared it to OpenAI’s models, noting the user-centric approach of Sonnet 3.7, which could potentially streamline workflows for developers. The ability to adjust reasoning depth provides a tailored experience, ensuring efficiency and precision in complex problem-solving.

Practical Applications of AI Tools

The practical application of AI tools was a key theme in Simon's conversation with Farhath. They explored how developers can integrate multiple AI tools into their workflow to optimize productivity. Farhath emphasized the importance of selecting the right tool for each stage of development, using a blend of options like Perplexity, Claude, and Cursor. This approach mirrors the evolving landscape of AI, where versatility and adaptability are crucial for success.

Entrepreneurship in AI with Mati from ElevenLabs

Guy’s interview with Mati from ElevenLabs offered valuable insights into the entrepreneurial journey within the AI sector. Mati discussed balancing product development with platform offerings, highlighting the importance of transparency. He stated, "here's our plan... if you're building... you should anticipate that we will compete with you." This candid approach aids developers in understanding potential market dynamics and planning accordingly.

Organizational Insights from ElevenLabs

ElevenLabs’ organizational structure is characterized by its small, nimble teams, as emphasized by Mati. With a team of under 150 people, they manage to lead in AI audio innovation effectively. Their unique approach to titles and management hierarchy, which discourages traditional seniority labels, fosters a collaborative and flexible work environment. This strategy aligns with their rapid innovation and adaptability in a fast-paced industry.

Summary

February's updates underscore the rapidly evolving landscape of AI models and tools. With significant releases like GPT-4.5 and Sonnet 3.7, developers and industry leaders are continually adapting to leverage these advancements. The discussions with industry experts highlight the importance of strategic tool selection and organizational agility. As Tessl continues to explore and integrate these technologies, the emphasis remains on staying informed and engaged with ongoing developments in the AI domain. Listeners are encouraged to participate in upcoming Tessl events to further their understanding and engagement with these transformative technologies.

Chapters

[00:00:00] - Introduction to February's Key AI Developments

[00:01:00] - The Competitive Dynamics of AI Model Releases

[00:04:00] - Exploring the Emotional Intelligence of GPT-4.5

[00:10:00] - Tessl's Evaluation of GPT-4.5 in Code Generation

[00:13:00] - Understanding Sonnet 3.7's Dynamic Reasoning Capabilities

[00:18:00] - Practical Applications and Tool Selection in AI Development

[00:29:00] - Interview with Mati from ElevenLabs: Entrepreneurship in AI

[00:37:00] - Organizational Insights: Small Teams, Big Impact at ElevenLabs

Full Script

Guy Podjarny: [00:00:00] To me, the interesting bit here is to think about them stepping into this complexity that all of these AI labs will have now which is where is it that I want to be a platform that people build on and I want them to be able to trust that they can build on me and I'm not going to go and and compete with them directly after and start limiting their access, et cetera.

And where is it that they actually want to capitalize on the solutions, and I think in the software engineering space, more and more appetite from these labs to be that solution. Anthropic specifically is it's an interesting spot on, they have a lot of fans in the AI Dev space that we're building on top of it.

And so I think if you're them, you want to be very sensitive.

Simon Maple: You're listening to the AI native dev brought to you by Tessl.

Hello and welcome to another monthly roundup of the AI Native Dev. My name is Simon, [00:01:00] joining me once again, Guy Podjarny. Hey Simon, it's fun to do another. Absolutely. And and this time around, we're going to take a look at what's been happening in the wonderful month of February and that we had some amazing news, we had some wonderful sessions as normally on the podcast Guy, another month goes by.

Another model or more models.

Guy Podjarny: Yeah, it's like 17 of them or maybe a little bit less. Yeah, but definitely was a big news day or news month with with both sort of GPT-4.5 and Sonnet 3.7, and actually like earlier in the month Grok 3 in the process. Yeah, like a lot of a lot of numbers and a lot of funky names.

Simon Maple: It, it is interesting and we'll go through those. I think the two that people are most drawn to here, GPT-4.5 and Sonnet 3.7. We'll go through that.

Guy Podjarny: That sounds like you're biased a little bit affected by the liberal media.

Simon Maple: My training data has been slightly poisoned. So yeah, I'm leaning towards them too.

Guy Podjarny: As long as you don't say that, like anything negative about Elon, your okay.

Simon Maple: Yeah, absolutely. So let's go ahead and talk. Actually, no, first of all, before we deep dive into the models, do you feel like [00:02:00] there seems to be like this snowball effect where very often one vendor, one model vendor releases an update and all of a sudden we see another one and another one. And it's it's weird, it's like one of those car races where everyone's nudging in front every five minutes. Do you feel like there is huge outside pressure from businesses, investors, marketing teams for companies to release because others are equally releasing that month or week, it's very convenient, right?

Guy Podjarny: Yeah, without a doubt, there is like massive pressure of it, part of it is just dominating the new cycle. So it's just, like this thing is the sort of whole world is really hitting mainstream media and everybody talks about it.

And so I think you don't want to let any other player, if you're one of these big players dominate the news for any window of time. So you want to pierce in and and break it out. And I think that's true for all of them. But I also think that there is like just the pace of experimentations and trials and all that is very high.

And so there is also an element of, literally amidst users, when you connect, get people to, that are excited to try a new thing, [00:03:00] if you let that sort of sit for longer, those users might get a little bit set in their ways. And so it's interesting. There is there's an open question about how much would people actually end up sticking to one model over time, like right now they feel very interchangeable. You can switch from whatever your Claude's chat to your OpenAI chat and we'll talk about that a bit more. But I guess the, some of you, especially like the big investors, like in the latest Anthropic round that also happened just now, they believe that there are some tracks that are being set.

There are some patterns and that may be similar to what happened in the cloud, which is the sort of the initial impression is like, fine, I didn't, I can write it on whatever EC2 or whatever other sort of cloud's equivalent. But in practice over time as the fine details have expanded. Moving clouds is actually not at all easy.

That maybe this will happen as well in that lives. I don't think we're seeing it now. So for now, I think it's more like new cycle domination and getting users.

Simon Maple: Yeah. Yeah. Very interesting.

Let's dig down into a couple of them. We'll start with the GPT-4.5, which I think was released first.

Was that right? No. It was a long time ago,

Guy Podjarny: No, but I [00:04:00] think Sonnet 3.7 actually.

Simon Maple: 3.7 actually, so the reaction was 4.5. Interesting. So we'll start with, we'll start with the most recent going back with, let's put it that way. So GPT 4

Guy Podjarny: The recency bias, detecting a lot of bias, a lot of bias.

Simon Maple: One of the interesting things, and this came out in a lot of the OpenAI marketing was this mention of it being, having higher EQ than previous models, the 4.5 release and some of the examples there, some, someone prompting saying I'm going through a tough time after failing a test and you look 4o which tries to solve the problem, but 4.5 is actually far more compassionate and trying to understand what the, what the user really actually wants is it, do they just want to chat about it? They want to know someone's there and it's a far more, dare I say, Claude like personable response. That's an interesting one. And I feel like people, I don't know, I'm going to, I'm going to give maybe my personal bias now, but people I find connect with Claude because it is more human than [00:05:00] some of the other GPTs. This is a very interesting move to have this more personal one.

Guy Podjarny: I think it's an interesting choice, there's a little bit of an acknowledgement that it's not just about your sort of benchmark numbers and how you're handling that, but also about just sentiment for it. And so I think I think that is interesting.

Like it's been identified as a strength of Claude and OpenAI is trying to counter that on it. I think there's also an element of just styles that people prefer, right? There's the whole kind of what is it, women are from Venus, men are from Mars, maybe 4o is for Mars and 4.5 is from Venus, as the but yeah and it is quite noticeable.

There, there's been some interesting and I think valid pushback that you might not really want a model that is, one style and a model that is of another, but rather to be able to express a little bit more of what type of conversation do you want at the moment? And I think actually in Claude, that you see in the drop downs, what style do you prefer? Do you want more formal conversation? Do you want more personal and defaults to personal? But it's interesting. It's just, I think, especially interesting that there isn't again a benchmark that you [00:06:00] can win on this EQ. So it will basically be judged by the population.

I don't think like maybe people are trying, but I don't think you can really measure how human, almost like counterintuitive, right? Like how human are you in your in your responses?

Simon Maple: Yeah. Yeah.

It's interesting. Like you say, I I'm really trying to stop myself saying Grok is from Uranus, but I feel like I can't, I feel like I can't just leave that sitting there.

So I'll finish with that. But I think some of the OpenAI tests that have that they've done themselves very interesting as well. They talk about essentially whether humans prefer the responses from 4.5 versus 4o, and on the big caveat, this is the OpenAI tests. Everyday queries. 57 percent of people prefer the 4.5 versus 4o, 63 percent of professional queries and 56 percent of creative intelligence. So there is data that kind of from OpenAI that says people do lean towards those, although I know there are some others. I think [00:07:00] you were mentioning you saw something on Twitter.

Guy Podjarny: It was surveys that counter that a little bit.

I think in general, I'd say we really have to this is such a massive sort of domain of actions and preferences and so much sort of room for a subjective analysis that I think it's a little bit hard to really take any data that comes out of these labs. Indeed, Grok 3 published a bunch of numbers that were cast in doubt.

Now these ones OpenAI I don't think it's even like a wrongdoing by them, like clearly they will take a rosy hue and wouldn't publish the stats that you have that are negative about you.

Simon Maple: Right?

Guy Podjarny: I think Karpathy put out a survey on Twitter where he did some blind testing, put some A and B responses that alternated between 4 and 4.5 responses, expecting everybody to then prefer the 4.5 responses.

Actually the survey ended up favoring 4o.

Yeah

And and I think that's a little bit underwhelming. I think it's important to put this a little bit in context of this sort of 4.5 release, right? So 4. 5 is the first kind of increment of the number since GPT-4 got released two years ago, March of 23.

And so I think the expectation [00:08:00] that people can have for this model, the expectations they have is, are really quite substantial in the, it's estimated to be, to have been sort of 10 times more expensive to train, it is between 15 and 30 times more expensive in terms of tokens, depending on when you look at input and output.

And and so I think you come into it and you expect something that would blow you away. So I would say that even the OpenAI stats that sort of say people slightly favor 4.5 don't quite match the expectations. People think more like we're expecting maybe something that's more like 3 to 4, like GPT-3 to GPT-4.

They only incremented half a number. Maybe there's a statement in that. So unlike the reasoning models that feel like they're really leveled up what the LLMs can do, 4.5 was a little underwhelming.

Simon Maple: Yeah, I'm reading those numbers, right? It's only a slight leaning in sentiment towards the 4.5,

Guy Podjarny: Exactly.

Simon Maple: even with the OpenAI numbers.

Guy Podjarny: And in the blind testing, it was even on the other side of it. Karpathy was wondering about whether the connoisseurs like the people I forget, he had a slightly a more humble kind of a term for it [00:09:00] for people with a fine taste for LLM's output of it might prefer the 4.5 as he does.

So maybe there's elements of that, but again, I just, I find it's not about whether people prefer 4.5, let's say 4.5 would consistently get slightly better than average preference. I guess coming in, you just have to say, hey, they're going to release this thing. It's going to be this sort of big price differential, not that I would expect it to be kind of a landslide, right? Like to win by a mile.

Simon Maple: Yeah. Yeah..

Guy Podjarny: And it's not doing that.

Simon Maple: And OpenAI interestingly do also share some accuracy figures where they show on their simple QA accuracy tests and improvement. It's a reasonable improvement. You're looking at, I don't know, what's this? 10, 20 percent on top of a 4o and o1, a significant, maybe what's that? 40, 45 percent increase over o3-mini, hallucinations are down as well across the other three models 4o, o1 and o3-mini. So they are showing some kind of stats that I would more believe in the sense of, if you look at them, it's actually tests that are run across multiple.

Guy Podjarny: Right.

Simon Maple: Cause there's a sentiment.

Guy Podjarny: And people [00:10:00] would replicate those. So it doesn't really serve them to to publish anything that would therefore, thereafter be debunked now.

Simon Maple: Rumor has it, rumor has it, Guy, Tessl are building a product as well. It's not a model, of course.

Guy Podjarny: It's not a model.

Simon Maple: So just to clarify after talking about models, it's not a model, but we use other models in our code generation and throughout the development workflow.

So we've done, we've built evals of our own and we've tested the various new models that have come out, every eval team, I'm sure on red alert on a week of multiple models coming out.

Guy Podjarny: Maybe its very curious. Like it's a, it's work, but I think people are excited.

Simon Maple: Yeah, absolutely. It reminds me of, when a security vulnerability comes out and everyone's all hands on deck, it reminds me of that when an eval, when a new model comes out, all eval teams and development teams are hard at work trying to test.

And I'm sure. Yeah.

Guy Podjarny: See what it can do.

Simon Maple: Yeah, product teams.

Guy Podjarny: I think there's a little bit more excitement, when you get that vulnerability of it, so yeah, in terms of 4.5 for us, the results were actually not exciting at all. So on the code gen side and actually it's not [00:11:00] just code gen because we talk about creating specifications and understanding them and such.

But a lot of it is around systems understanding. It actually did worse than 4o and not by a small amount. I want to caveat that a little bit that I do think every model has, there's a level of just like, hey throw the previous model's prompts and processes at it and see how it fares.

And then there's like levels of optimization. And we have seen cases in which when you optimize, you actually surpass what you could do with the what you've optimized for before. But the gap is not small. So the numbers are not accurate enough to really share precise numbers here.

We're still evaluating them but it really is like substantially below 4o and I think it comes down a little bit also like this comment on hallucinations, what we've seen over, over the past sort of six months is that even within the same model, within 4o releases of it, they're not always progressing on a straight line.

And so some of them might become more creative. Some of them might become more, whatever, analytical and so we've even seen regressions within the same model over time in terms of their specific code gen and [00:12:00] working with OpenAI, it's actually, it's legit, right? It is different capabilities in different, are highlighted in different versions of the model.

I think what's a little bit odd here is that we're trying to do something that is more analytical and and in that it did not fair as well. And that should be if they're highlighting the reduced hallucinations and things like that, those like it should be better over there.

But it does maybe align with their statement on the emphasis on EQ and and human interactions, which might not serve, for our purposes might not be as as important. And so I think what we're learning. Like I said, so I would say high level jury's still out.

We need to try it out. It's first models. Maybe there will be big improvements on it. It'll be interesting to see the reasoning models that are built on top of 4.5 or on top of 4o so I think there was a lot to still see, but a rival hasn't blown us away. Like it has actually been a bit of a setback.

And I think it's hard right now to say, yeah, I will pay like a, on order of magnitude more for for switching to it. And that's our experience, but also [00:13:00] generally like I've really have yet to see someone really swear by it. So we'll see. Not not hasn't arrived with with a big wave of success.

Simon Maple: Yeah. And actually, let's switch across now to Sonnet 3.5. You talked a little bit about reason,

Guy Podjarny: 3.7,

Simon Maple: I keep doing that. It's been 3.5 so long.

Guy Podjarny: A lot of numbers.

Simon Maple: It's been 3.5 and so long. I automatically default to that. Sonnet 3.7. You mentioned reasoning. One of the big things that dropped in Sonnet 3.7, of course, was dynamic reasoning.

It's the first hybrid model reason, first hybrid reasoning model on the market, as they state as a combined.

Yeah.

It does yeah, step by step style thinking, similar to DeepSeek, right? The way DeepSeek very verbosely stated what it was doing. What are your thoughts on, first of all, the dynamic reasoning aspect?

And yeah, secondly, I guess comparisons to other models, to, OpenAI models.

Guy Podjarny: Yeah, I think there was a few things to note for it. So first of all, exciting to see the reasoning models from Anthropic. I think it's the first reasoning model or sort of thinking model. And I think that has, definitely pioneered by OpenAI, but it's now being accepted as the right model.

Maybe I [00:14:00] sort of say three things. One is indeed they've chosen to be open with its thinking like DeepSeek. It doesn't ramble on and on as much as R1. Like DeepSeek, yeah, it's chatty.

Simon Maple: You were worried about your tokens when you saw it chatting back.

Guy Podjarny: It really takes a while to make up its mind, which is by the way, fascinating, especially when you ask when you see how it thinks about what it should answer to appease the human.

Simon Maple: There's a level of trust in that, though, when you see that you believe it's doing the right thing and you feel like it's charming for a moment and then just gets in the way and you can collapse it.

Guy Podjarny: I think the new, versions actually now allow you to hide that a bit more. So it seems like Anthropic have actually toned that down a bit more, but it is more open than OpenAI. And I think maybe to an extent OpenAI's decision to hide the the reasoning logic didn't work maybe as much as they might have planned in the sense of delaying others from creating reasoning models.

Just based on the pace of others releasing them. But definitely it's a different choice. It's more transparent and you write a bit more sort of trust building and just interesting. Like we do see that when we look at the reasoning logic, for instance it guides us as [00:15:00] to as to, where is it that we might have gotten a prompt, correct or or not. Yeah. So I think that's interesting. That's maybe a bit more on the sort of aesthetics or some choices of it. I think the second aspect of it is indeed this dynamic reasoning. That's actually a novel, it's been mentioned by OpenAI, it's been mentioned by Google before.

So I don't think it's a new concept, but I think it's the first one that actually provides it. I'm not entirely sure about the Gemini one. And what it means is you can have the model tell the model how hard to think. And that's a difference between OpenAIs world in which you choose whether to go to a non thinking or a fast thinking model like 4o or 4.5.

Or do you go to to, to these thinking models, o1 and o3, o3-mini and you have them producing and think deeply here. They're saying like, hey, we don't want you to switch models, it is the same model and you just want to say how many cycles, how many kind of iterations or digestion you have around it?

I don't exactly know the sort of the math behind the scenes, but I think, it makes more sense from a user perspective, right? I don't want to, choose does this require [00:16:00] thought or not? I don't want the binary level of think a little bit or or think a ton and just max them in.

So I think it's really interesting and I think people will play with it. I do think that the hope is that it would figure it out itself. If I ask you how much a 3.3, three plus three is then, or what's the latest version of Sonnet on it?

Simon Maple: I'll give you an answer. Yes. You don't, on the spot, I'll give you an answer.

Guy Podjarny: You, you might I might expect you to not think very hard, but I'd give you a multiplication of some sort of seven digit numbers. I might expect a pen and paper. And so I start writing. And so I think there might be another layer coming that decides how much to think.

Simon Maple: Yeah.

Guy Podjarny: And I think that's pretty interesting.

Like I guess I relate to that from a product sensibility perspective. I don't have a strong opinion around its implications of the model, right? Yes, you can, you can imagine how It really should be a very different model, one that is just like snappy and just gives you the result versus one that is a deep thinker.

Simon Maple: And, and I think talking about it from the [00:17:00] product point of view, one of the first, certainly one of the first I saw reacts to it was Bolt.new. And of course, I'm a massive fan of Bolt, I must admit. I've used Bolt. new numerous times now and creating like fun prototypes as a Java backend developer.

Yes, hands up. I'm happy and proud to admit it. I'm useless at front end development. So Bolt that will actually do that for me and I'm not a good design person either. I don't know why you hired me. I'm not a good design person. Not good.

Guy Podjarny: I was like, you're just sort of opening up all these other thoughts on it, but I think you're doing a good job.

Simon Maple: So yeah, I, from my point of view, Bolt just does, it fills in the gap, we talked about it filling in the gaps. It fills in my gaps in terms of, the my the areas that I don't necessarily find natural to build. But one of the interesting things is they introduced dynamic reasoning as a toggle.

It's a beta for them. They introduced that into their products, and it allows them based on the request that a user sends, does it need to think deeply or not? And to your point, if I'm changing the color of something or if I'm doing something where it just needs to [00:18:00] effectively do something similar that I could do in the code just as a copy replace or something, but I don't have to dig in the code.

It can just do that very quickly. But if I'm asking it for a major architectural reformatting or architectural decision, yes, it needs to think more deeply. So it's really interesting to add that in and for it to be able to say yes, I'm.

Guy Podjarny: And I think that makes a lot of sense. And it's I think a part of the Claude bet is that people want to be able to control which actions require that, which again, OpenAI also makes possible by basically making a call to one model and the other. I think what's interesting to say is how people relate to the how much do you want to think? It's easy to say here, you don't need to think like here, just give me a quick answer right here think deeply before you answer. When is it that I would want you to say, think 40 percent hard, right?

Or think 60 percent hard. And so I think it depends on what the sort of the reality of the output is, but that's no pun [00:19:00] intended, but a little bit harder to reason about, say what what is, what are the cases in which you want to think hard, but not hardest about and how, where is it that you're willing to take a latency hit, but not complete latency hit. You do see a little bit more in Claude despite all that sensibilities, you do see that, at least at the API level, they have maybe a bit more of an enterprise use case bias. And so there's a lot more flexibility. That with how they've approached MCP versus the more sort of simplified tool use that that, that OpenAI touts.

So there's, again, maybe a little bit of that sort of different flavors, different tracks that the different models are choosing. Yeah. I do think there's a third bit that's interesting around around the Anthropic and the reasoning release which is, first of all, there, there is a difference.

I want to note that there is a difference between what they do in the API and the more simply simplicity that they have in the user products. So when you use Claude, you do just have a binary, some more similar to o3 and o1. So I think that the third thing to say about [00:20:00] about Anthropic is is there a sort of recent choice also to launch Claude Code?

And I think in code is once again, a little bit of that sort of flexibility sensibilities unlike maybe some of the others they have chosen to release more CLI esque type tool for code generation. And I find that to be interesting, kind maybe the other bit about the less discussed aspect of sort of the Claude release with 3.7.

I'm sure it builds on the capabilities within, but I think you're seeing maybe two things in action here. One is, of course our sort of leaning into an application type capability. And we've seen that with the chat product of it, but I think it's the first we're seeing from Anthropic to try and offer something that's a bit more sort of a complete use case.

Simon Maple: Yes.

Guy Podjarny: And the other is is maybe just. I don't know if you're seeing it, but I thought that comes up is how do they relate to that to people that are building on top of them, like Cursor, like Bolt.new and to me, the interesting bit here is to think about them stepping into this complexity that all of these AI labs will have now, [00:21:00] which is where is it that I want to be a platform that people build on?

And I want them to be able to trust that they can build on me. And I'm not going to go and trying to compete with them directly after and start limiting their access etc. And where is it that they actually want to capitalize on the solutions? And I think in the software engineering space, more and more appetite from these labs to be that solution.

Anthropic specifically, it's an interesting spot on, they have a lot of fans in the AI dev space that are building on top of it. And so I think if you're them, you want to be very sensitive to what are the statements you're making around when should you use Anthropic solutions directly?

Simon Maple: And I think you're spot on there with the CLI ultimately being an extensibility point for others to build upon and yeah, it's because if you think about what they have done in the past where you are able to, with Claude Artifacts and things like that, where you're able to actually build a little React App and you can stand the front end up.

And that's the closest we've got [00:22:00] to, them building something specifically for developer interactions to build with. Claude Code is that biggest step where it's a separate release and I think it's in beta only right now, but yeah, it's calling out for people to build on top of it cause I don't think someone going in a developer going in wants to use a CLI. I think Ada was in the early days, I say early days, like it was like.

Guy Podjarny: Yeah, And I think it's still around Ada, but definitely it was innovating at the time.

Simon Maple: Yeah. Yeah. And no one's really followed that part. So I think Claude is looking at this not as a full UI for a developer to use, but more for builders and tinkerers to use.

Guy Podjarny: Yeah, it's a good, it's a good theory, but I think it's a tricky balance. And I've actually just alluding to a great episode we had this month with a guest which is, I did actually discuss this very topic with Mati from ElevenLabs.

Simon Maple: Yeah. So let's jump over to the past, to the podcast. There are a few episodes that we had this month. Do you know, do you remember when we were at State of Open Con? Way back when in, I think it was 19, 1990 something, I can't remember. It feels like a [00:23:00] long time ago.

Guy Podjarny: A few weeks ago.

Simon Maple: It feels like a long time ago, yeah, early, early February, we did an Ask Me Anything, live from the State of Open Con. Wonderful open source conference and yeah, had some great questions, great discussions with a full panel. Do you know what Guy, I think it feels so long ago and so much is happening. I don't, I think we've covered that.

We actually had a session just before. With the January monthly where we talked about it in the news as well. I don't think

Guy Podjarny: I think we've covered enough DeepSeek to do that.

Simon Maple: I don't think we need to talk too too much about it

Guy Podjarny: to do that. We'll talk about it again when Hugging Face eventually launches. It's whatever it's about U.S. U.S. Core a good panel worth doing.

If you still have questions about DeepSeek, I think we covered quite a lot of topics, the legal side of it, the just like many aspects of it.

Simon Maple: Actually, I lied Guy, I'm going to ask one question about it. Yeah, I'm going to go just because we said that do you think they will be able to either build or we, we talk about how DeepSeek distilled from other models with things like dynamic reasoning can you distill dynamic reasoning from a model or do you feel like you [00:24:00] actually have to have some almost like dynamic reasoning plug in kind of things? It's a tricky one, right, because

Guy Podjarny: I think they're all I distillable is a word, but I think basically the sort of the recent history has shown that it's all mimicable on it fundamentally, as I understand it, if you were to simplify, you would say fundamentally what you need is you need data.

So you need the ability to say hey, if I, if I pass this inputs, these sets of inputs, tokens or whatever kind of they may hold I expect this type of response, or I can evaluate this response. And so as long as you are able to extract that out of others to be able to create the synthetic data.

And you're able to train your system. I'm sure there's also algorithmic aspects of how do you interpret that? And although there's more complexity to it, but at the core of it, if you have access to the model and you can see how it responds, you can generate more and more data that helps you mimic that behavior.

And so I do think dynamic reasoning is a setting, like if it becomes useful and if there are, [00:25:00] I don't know, maybe the world does it like temperature, right? It is, it's still a bit of a voodoo parameter. People find value in it. But, is, do you really know the difference between 0.6 and 0.5 and probably not that much. So maybe the, there's some three or four like a voodoo dynamic reasoning kind of levels that people get used to, then people will probably distill that. And, and copy it over. So yeah, I do think actually, if anything, both 4.5 and 3.7, both demonstrate the continued kind of a parade indeed of of just everybody moving together.

I have not seen like what we just discussed right in 4.5. The conversation is about OpenAI mimicking claude with its sort of EQ and sensibilities and in Claude and Anthropic models we talk about them basically mimicking things that that OpenAI's done around reasoning and around maybe a little bit of like the equivalent to their canvas with the code creation.

And so there are subtle differences around how they build it. But fundamentally, I think they're all copying one another. [00:26:00] And again, Grok has done that, and I think there are bigger changes like you see, Grok and DeepSeek. Here we are talking about this again, which are polar opposites in terms of how you train and the underlying models.

And so you have DeepSeek working on whatever low level and not utilize, like squeezing the most out of old hardware versus Grock, just xAI, they're throwing, tons and tons of money to be able to get the latest nvidias. And but somehow they both end up producing things that are comparable.

Similarly, I think Anthropic has made a big bet on using Amazon's GPUs and so there's sort of implementation choices. I've yet to really see something that feels like a sustainable difference between them.

Simon Maple: Yeah, yeah. Let's jump into the next two. But first, I want to do a quick shout out for people who are interested in the kind of like the model comparisons.

Yeah, I recorded an amazing session with Macey Baker, who talked about prompt engineering on a previous podcast. She's a community engineer at Tessl. It was so much fun talking about how she created a couple of games one as a as the [00:27:00] Werewolves game one as a Split Steal kind of game and we paired models against each other and it was hilarious if you want to talk about EQ just wait till you hear what happens with Llama and also with with DeepSeek also some other news from the AI Native Dev, which is that, do you remember the conference that we ran last, last year in November, AI Native Dev conference.

Guy Podjarny: Indeed.

Simon Maple: It was amazing. We actually decided, a year is too long to wait for another one. Things move so fast. We have a 2025 spring edition of the AI Native Dev.

Guy Podjarny: Not to be confused withh the Java spring of it.

Simon Maple: Absolutely. Believe it or not, it's going to be better and bigger. So yeah, for people who are interested in how you can use AI dev tools today, the future of AI native dev in tooling and practices and programs.

And also a third track, a new track that we've got, which is, AI tools in action, so people actually using some of the best tools around today so that you can actually get practical hands-on [00:28:00] experience. Yeah. So yeah, the CFP is.

It would

Guy Podjarny: be a great event. Yeah. And the CFP is secretly open.

Simon Maple: secretly open. It's open. We'll share the links in the in the in the show notes. Head over to tessl.io and you'll be able to find the links from next week so you can register to the event. You can submit your papers as well. We'd love to, we'd love to see you there again.

Guy Podjarny: Yeah, we're really looking forward to to the event.

I think, the whole purpose of AI native dev platform here, right? The community is to have people share their learnings. And I think the conference is like a great way to, get top thinkers around this, whether you're a tool builder, you're a user of those tools, whether you are an AI philosopher and you have some worthy things to share around how you think it would affect software development. I think creating a stage for that, so people can share their learnings is really the purpose of what we're doing over here. So yeah, we're really looking forward to it and would love to have you all both attend and submit your thoughts to present.

Simon Maple: Absolutely. And talking about a multitude of tools. I had a very practical session. We'll talk about two sessions and one with Farhath who I [00:29:00] spoke with and then Mati, the founder of ElevenLabs that you spoke with. So Farhath was, I've known him from the community for a little while now.

Super smart guy. Also really on the pulse with all the AI Dev tools around. And there's a number of takeaways actually from that session, but one of the biggest takeaways that I had was the fact that he was actually as a user of AI dev tools, he doesn't just stick with one. He uses the right tool at the right time, depending on what he's trying to get out of it.

And in multiple cases, he actually uses multiple tools. And it's very interesting how I think, there's probably a number of them that he uses in freemium, a number of them that he pays for. But that's critical, right? It's, we're not just using a single tool. We're actually, creating a tool belt for ourselves where we use the right tool at the right time.

This is going to get into a little bit of buzzword bingo now as I go through some of these tools.

Guy Podjarny: So yeah, no, but I think I love the session. I think very practical. You get into specifics, but I would say it's a worth listen if you want to see a picker of, it's an individualist, one person might not be precisely right for you, but it is a more [00:30:00] complete picture of, this is how I use these tools across many aspects of my software development.

Simon Maple: Yeah. Yeah, absolutely. I'll mention a couple, but then we'll jump on. I think some of the interesting ones, research and ideation, moving, using Perplexity to understand what tech stacks what the best libraries, what APIs are there for various projects using Claude for prototyping Claude Rules to, to adjust to his style.

For development, Cursor, Cline, Windsurf, Rue Code, which gives him the opportunity to, to provide that agentic workflow for web search essentially, there's a number there really is. Swimm was another one that he's quite interested in playing with. Eraser from a DevOps point of view and a deployment point of view.

There was a ton do, if you're interested in understanding what each of these tools, how they can be used and where others use them, that's a must listen.

Guy Podjarny: Yeah. I think, I think really good listen to it. And probably the thing that struck me the most is the cases in which he's actually using different tools for the same job.

And so [00:31:00] he, he uses both Windsurf and Cursor, that's. If you ask people a year ago, three years ago so say, Hey which IDE do you use? There will be a singular answer, right? It was probably going to be this VS Code at the time. And there would be one how many people were using multiple IDEs, a tiny number.

And I think in AI, we're seeing more, my bet is that it's still, it's still because this is all experimentation phase and they all kind of one up each other in some specific domain for a brief period of time. And I think in whatever, a couple of years time or some period of time, people will get back to I have my tool.

It's. 90 percent good at, all of these things and it's not worthwhile for me, like not worthy to switch to another to to use it. Also, I think they're all benefiting from being like, at least the ideas from being VS Code for it. And so they have some familiarity, but still but it is interesting to see the kind of open mindedness of of people to them.

Simon Maple: Yeah, no, I loved your session with Mati. Yeah, what were your highlights?

Guy Podjarny: So really fun conversation with with Mati from ElevenLabs in case you're not familiar, ElevenLabs is the leader in AI audio today. So they're now a [00:32:00] 3 billion company. They're, we've got, millions of users on it, is the sort of the de facto standard today of if you want to do text to to speech, text to audio and then you use their their product.

And so it was really interesting to hear the conversation felt privileged a little bit, both because I know Mati from the sort of the London scene on it, but also he doesn't actually do that many sort of podcasts and such. So it was great to get a chance to talk about his journey.

It's a little bit less devish, a bit more about AI entrepreneurship and building on top of it. But I think like a lot of the conversation was fun and I highly recommend listening to it. I'd highlight three takeaways, one is we talked about the it always felt like this sort of textbook story for how to do an AI startup from a focus perspective.

They started with this big vision of dubbing which requires a lot of moving parts from like you're listening to someone speak in one language and then him to understand and capture that and understand the voices and who said what, when, and then translate it. And that's just a lot of moving parts.

We can outline those. And then they honed in on text to speech as a specific use case that is that is valuable in its own [00:33:00] right and figured out sort of good tech innovations to, to break through there and really honed in on that and voice cloning as the two kind of core capabilities and really succeeded with them.

And now they're expanding back into the the broader sort of dubbing vision and they actually just launched. We recorded it a little bit before they launched their speech to text model that gets, gets a bunch of those things, right? Like the ability to know who said what, at what points and get precise timestamps and things like that.

Simon Maple: And thats called Scribe, right?

Guy Podjarny: Yeah and it's called Scribe and it's a really cool tool. I I tried it out. It's easier to find use cases off the fly for text to speech than it is to speech to text. Yeah but I think I really I'm keen to have it in meetings, when there's like a person meetings on it.

Simon Maple: Yeah.

Guy Podjarny: So if there's, if you're today in a zoom, everybody with their own computer, it's easy to know who said what, when, but when you're in a room and multiple people are talking, I'm keen to try it out for that.

Simon Maple: And fun fact, the room that we are recording in is called Champagne. And Champagne is [00:34:00] an artist, I believe that's the tessellation artists and all of our meeting rooms are called, are named after those who are artists and around tessellation.

And whenever we're in a meeting room like this and we're on a zoom, I look at the transcript sometimes and it's basically a conversation between me and champagne and, I never realized my life would get to that at some stage where I've just sat talking to champagne, but it happens now.

Guy Podjarny: Every now and them, sometimes on the work. So I think that was really fun to hear the entrepreneurial story of it. And I think if you're a, if you're building an AI company or just interested in entrepreneurship, I think it was like a, a good sort of textbooking an example of something you would use. I think the second thing that really stayed with me is the approach to developers and indeed tackling this problem we talked about with the AI labs around product versus platform. And so OpenAI, sorry, ElevenLabs has always been both a product and kind of an API for developers and so they, they, you could go to their UI, plop in some text, pick a voice [00:35:00] and produce audio without being a developer, without being any creator. And I think half their users basically use the product that way. And then you can use their API to be able to integrate text to speech into your, into your application.

And over time they both audiences grew, which is interesting. And they've also in the API have grown to provide more sort of big building blocks, conversational agents. And I found that to be really interesting of like building, building out, understanding the use cases. Again, some of it is just textbook kind of company building, but understanding from the use cases, people were using their APIs for what are more complete, programmable use cases they want to make available and then make those easy.

And so we had a really good conversation about that and mostly what I was trying to push him a bit on is if you're a developer and you're building on ElevenLabs, how do you like, how do you know to anticipate when would ElevenLabs potentially release a product and and compete with you? And I think that is a question on a lot of [00:36:00] startups minds and because they're building on these AI labs and these AI labs are a bit declaratively going into the application layer in certain fronts. And I liked his answer, but really anchored on transparency. He said here's our plan. Here's our vision of which he explained more broadly. It talks a little bit about call centers.

It talks about the dubbing and this world. And it says, if you're in that world, if you're building, if you're using ElevenLabs to create dubbing solution on it. You should anticipate that we will compete with you. We'll have a product. But this declaration is also a bit stating what they're not planning to do.

So I liked the approach, but also again, I found that conversation to be really interesting. Yeah. That's the second bucket there.

Simon Maple: I'm from a, I don't know if I'm preempting your third bucket now or not, but from a organization build point of view, There are a couple of interesting things he mentioned.

Yeah. One was,

Guy Podjarny: can I interrupt you over there?

Simon Maple: The Oh, absolutely. Yeah. Yeah.

Guy Podjarny: We had a sort of a scribe here. It would interrupt. We had a very interruptive conversation. Generally. The yeah. I'm sorry. Let me cut you off there.

Simon Maple: Yeah, [00:37:00] the, yeah.

Guy Podjarny: The organization conversation was,

Simon Maple: you, now you're naturally interested.

Yeah. The organization, because, because a lot of people think, okay, for this, we need huge amounts of money, and we build a big, solid, very strong team to build this out. And the way we build faster is through the team, but that's not so much the case in ElevenLabs, right?

Guy Podjarny: So I think in general, I think the math in these sort of big AI labs is a little bit different because they spend, buckets of money on GPUs and all that sort of training exercise.

But yeah, the organization piece is, it was indeed the sort of my third bullet of it, on ElevenLabs. The team is still under 150 people. It is generally quite small and nimble for a company that is I think north of 100 million ARR and again, millions of users and valued at 3 billion dollars on it.

So that's already quite impressive. But I tried to probe a little bit Mati's , for Mati's views on longterm research versus incremental research and asked him how do you think about now investing in breakthrough research, finding out that they haven't [00:38:00] been actually building a new model versus things that are just a little bit better.

And, that was a good conversation I think worth worth hearing, but but then he kept referring to the teams being small. Yeah. And someone's like asking how how small, and it turns out the team that is building these sort of big models, the breakthrough next model is a team of five people.

And that's tiny when you think about the impact that each of these individuals has. And even the other team, I think it's some five to 10 people or seven, eight people, I think in the the follow on conversations, though people work on incremental. That's also still a very tiny team serving, millions of users that are using these models. Yeah. And so I found that to be, one interesting, two quite almost like inspiring, right? So think about the, the breadth of impact that an individual could have right within those, those companies and an impressive as they build it and there was another conversation on it, on on how do you then run that organization and the choice that they've made around eliminating titles in across the organization, which I also found to be, I don't know, definitely [00:39:00] daring. So they've taken away all titles. They didn't really, as you probe in, they do have you are the team lead, but it's a bit like what you might tell your like a good parenting philosophy is when you talk to your kids, don't tell them that, they are stupid.

Tell them they've done something stupid, right? Don't tell them they are smart. So they've done something smart, so they get to develop a little bit their identity, but they judge their actions by their actions, not just embed all of that, to I will always do that. And so it's a little bit like that.

They said you are managing this team versus you are the manager of this team so they have functional kind of a semi titles, but they've chosen like to really embrace that sort of company wide to not have seniority reflected in the title.

Simon Maple: Which it's quite a different vibe. It's quite a different vibe to us at Tessl, isn't it? 'cause of course.

Guy Podjarny: It's just you who we give like a low kind of,

Simon Maple: For those, for those who have never worked for Guy, he always insists, never look him in the eyes. And of course

Guy Podjarny: Your Highness sometimes is also

Simon Maple: And of course if you're referring to him, it's always Grandmaster Podjarny.

For, yeah. [00:40:00] It's quite, it must be interesting to, to work in that.

Guy Podjarny: So it was it was kind a, a fun conversation clearly. In Tessl we also talk about the head of it, such avoid this sort of whatever, CXO type titles, to be nimble.

It's interesting to think about how that, that scales and how, whether that works better when you're a more research centered organization versus product. So I think really fun conversation on it. Highly recommend that you listen to it

Simon Maple: 100%

Guy Podjarny: And that you try out ElevenLabs, frankly, like it's a fun product.

But these are probably my three big takeaways.

Simon Maple: Amazing. Amazing. We're wrapping up here now, Guy. Yeah, really fun conversations. I definitely think you should. listen back to those last couple for sure. Yeah, we've got some fun coming up next month as well. I teased the Macey session, which is coming out next week.

Of course, if you don't want to miss a, an episode, make sure you subscribe. We have almost at 10,000 subscribers across across our platforms. So yeah, help us grow that, which is great for a small. And also not small, a an early stage podcast.

Guy Podjarny: Fairly new.

Simon Maple: Yeah, absolutely. So help us grow that.

And and yeah, of [00:41:00] course finally the AI Native Dev Con, feel free to, feel free to respond to the CFP and make sure you registered for that. But till next time, Guypo, it's been a pleasure.

Guy Podjarny: It has been. Yeah. And I hope you join us for the next one.

Simon Maple: Tune in soon. Thanks for tuning in. Join us next time on the AI native dev brought to you by Tessl.

Subscribe to our podcasts here

Welcome to the AI Native Dev Podcast, hosted by Guy Podjarny and Simon Maple. If you're a developer or dev leader, join us as we explore and help shape the future of software development in the AI era.

Subscribe to our podcasts here

Welcome to the AI Native Dev Podcast, hosted by Guy Podjarny and Simon Maple. If you're a developer or dev leader, join us as we explore and help shape the future of software development in the AI era.

Subscribe to our podcasts here

Welcome to the AI Native Dev Podcast, hosted by Guy Podjarny and Simon Maple. If you're a developer or dev leader, join us as we explore and help shape the future of software development in the AI era.

THE WEEKLY DIGEST

Subscribe

Sign up to be notified when we post.

Subscribe

THE WEEKLY DIGEST

Subscribe

Sign up to be notified when we post.

Subscribe

THE WEEKLY DIGEST

Subscribe

Sign up to be notified when we post.

Subscribe

JOIN US ON

Discord

Come and join the discussion.

Join

JOIN US ON

Discord

Come and join the discussion.

Join

JOIN US ON

Discord

Come and join the discussion.

Join