Podcast

Navigating AI Security with Gandalf's Creator

With

Mateo Rojas-Carulla

18 Mar 2025

Episode Description

As artificial intelligence becomes integral to daily life, ensuring its security has never been more critical. In this episode, Guy Podjarny hosts Mateo Rojas-Carulla, co-founder of Lakera and creator of Gandalf, to explore the pressing security threats confronting AI systems today. They delve deeply into vulnerabilities like prompt injections, jailbreaks, data poisoning, and manipulation of autonomous AI agents. This conversation provides valuable strategies and considerations for developers, security professionals, and organizations seeking to navigate the evolving landscape of AI security.

Subscribe to our podcasts here

Overview

Introduction

The latest episode of the podcast features a compelling discussion with Mateo Rojas-Carulla, an expert in AI and security. Mateo delves into the evolving challenges and strategies in securing AI systems, providing invaluable insights for developers and security enthusiasts. As AI systems become increasingly integral to various applications, understanding the intersection of AI functionality and security measures is crucial.

Security Challenges in AI Systems

Mateo kicks off the conversation by addressing the concept of over-permission in AI systems. In traditional software environments, permissions are managed to ensure users have access only to necessary resources. However, in AI systems, this challenge is intensified due to the complex nature of AI functionalities. Over-permission can lead to vulnerabilities where users or attackers might gain unintended access to sensitive data or functionalities. Mateo emphasizes the need for robust permission frameworks tailored for AI environments. As Guy Podjarny aptly puts it, "the security aspect is over permission is like, make sure that people don't get access that they should not have."

Moreover, traditional security threats like SQL injection and cross-site scripting find new expressions in AI contexts. These threats aim to manipulate systems into executing unintended commands, making it crucial for AI systems to incorporate advanced security measures. By understanding these traditional threats, developers can better anticipate and mitigate similar vulnerabilities in AI systems.

The Complexity of AI Functionality

One of the significant challenges in AI security is the nebulous nature of AI system functionalities compared to traditional software. Mateo explains that AI systems often have less defined ends, making it harder to anticipate how they will interact with various inputs. This ambiguity necessitates a thorough understanding of both the means and the ends in AI security. As Guy notes, "you can make the case that the end in AI or LLM powered system is probably like less defined than it is in any sort of traditional software applications."

AI systems are designed to learn and adapt, which can lead to unpredictable behaviors. Developers must ensure that AI models are robust against manipulations that could exploit these behaviors. By focusing on specific use cases and outputs, security teams can better identify potential vulnerabilities and address them proactively.

Jailbreaks and Prompt Injection

A critical topic discussed is the concept of jailbreaks in AI, where users manipulate models to bypass alignment mechanisms. These mechanisms are designed to prevent harmful outputs, but sophisticated users may attempt to override them. According to Guy, "for the sake of defining them, a jailbreak is one where the user is directly trying to manipulate the model that is interacting with to bypass some of its alignment mechanisms."

Prompt injection attacks occur when input data is crafted to manipulate the model's behavior subtly. This type of attack is particularly challenging due to the low explainability of AI systems, making it difficult to detect and mitigate. Developers must continuously update and train models to recognize and resist such manipulations.

Dynamic Security Utility Framework (DSEC)

To address these challenges, Mateo introduces the Dynamic Security Utility Framework (DSEC). This framework rethinks traditional AI security evaluations by balancing security with user utility. Instead of solely focusing on blocking attacks, DSEC emphasizes maintaining system functionality while enhancing security. Guy describes this shift as a move towards "a broader lens how we think about security, how we evaluate security and the kind of investments that need to be made."

The framework encourages a broader perspective on security evaluations, considering the dynamic nature of AI systems. By integrating DSEC into security practices, developers can improve their ability to protect AI systems without compromising user experience.

Agentic Systems and Security

Agentic systems, which operate with a degree of autonomy, present unique security challenges. Mateo discusses how security measures can impact user experience, particularly when integrated into the execution flow of AI programs. While external defenses offer advantages, they must be seamlessly integrated to avoid disrupting the system's functionality. Guy highlights this by stating, "the security solution is very deeply intertwined in the execution flow of the program, no matter what."

The discussion highlights the importance of balancing security with user experience. As AI systems become more agentic, developers must ensure that security solutions are deeply intertwined with the program's execution flow, providing protection without hindering performance.

Red Teaming and Security Testing

The importance of red teaming—simulating attacks to identify vulnerabilities—is emphasized as a critical component of AI security. Mateo shares experiences collaborating with leading agent builders to enhance security through rigorous testing. By adopting red teaming practices, developers can uncover and address potential weaknesses before they are exploited. Guy notes, "we have found that it's very surprising what you can found and what you can achieve via these types of attacks."

Red teaming provides valuable insights into the security posture of AI systems, allowing developers to refine their defenses and improve overall system resilience. This proactive approach is essential for maintaining robust security in dynamic AI environments.

Summary/Conclusion

The podcast underscores the evolving landscape of AI security, highlighting new challenges and strategies for developers to consider. Key takeaways include the necessity of redefining security frameworks like DSEC, understanding the complexities of agentic systems, and implementing rigorous testing through red teaming. Developers are encouraged to adopt a proactive security approach, balancing system protection with user functionality, to ensure the safe and effective use of AI technologies.

Resources

Dynamic Security Utility Framework (DSEC): arxiv.org/abs/2501.07927

LangChain Library for Agentic Workflows: langchain.com

Gandalf: gandalf.lakera.ai

Lakera: lakera.ai

Chapters

[00:02:00] Over-Permission in AI Systems

[00:07:00] Nebulous AI Functionality

[00:10:00] Jailbreaks and Prompt Injection Attacks

[00:18:00] Introducing the Dynamic Security Utility Framework (DSEC)

[00:23:00] Security in Agentic Systems

[00:28:00] Red Teaming for AI Security Testing

[00:35:00] The Future of Agentic Systems

[00:42:00] LangChain and Real-World Vulnerabilities

[00:48:00] Proactive Security Strategies

Full Script

Mateo Rojas-Carulla: [00:00:00] First of all, we already have seen like fully authentic systems. I don't know if you've followed OpenAI's recent deep research, for example.

Guy Podjarny: I've made a very good use of it to recently to thoroughly research and find the top 100 best dad jokes out there and sort out the 50 that are least known within them and rank them by humor.

It took it eight minutes and it's the first time that I actually managed to get AI to it doesn't even create, but find new and pretty good dad jokes as dad jokes go, I'm a fan.

Simon Maple: You're listening to the AI Native Dev brought to you by Tessl.

Guy Podjarny: Hello everyone. Welcome back to the AI Native Dev. Today I am excited to talk again about security. It is a passion of mine on it. And to do that, we have Mateo Rojas, who is the co founder and chief scientist at Lakera, which is a great kind of AI security company. We'll talk about not so much their products, but Gandalf they've [00:01:00] created.

And he's here to talk about kind of both LLM security, AI security as a whole, and specifically agent security. So Mateo, welcome to the show and thanks for coming on.

Mateo Rojas-Carulla: Great to be here.

Guy Podjarny: So Mateo let's, let's start by setting the stage a little bit about LLM security. We've had a bunch of episodes about it, but I don't maybe not everybody has seen it.

And also, I don't know, it's a fast shaping world. So maybe definitions are a little bit off. Maybe let's start by saying what is to you? What should we know about, or how should we think about LLM security, AI security, or maybe LLM security specifically?

Mateo Rojas-Carulla: Yeah, thanks. Great question. I think there is a lot of, terminology in the air, when you hear about security, there is prompt injections that are jailbreaks that are like data poisoning all these different ideas. And I think it's helpful to take a step back and try to look at the whole picture. And maybe I would say there are two types of vulnerability in LLM based applications.

The first are traditional vulnerabilities. When you're designing a software system, you want permissions to be set right. So people talk a lot about RAG for example [00:02:00] and whether the right users will have access to the right data in that RAG, to a large extent, that really is a question of, permissioning, who has access to what documents and those are very important problems, but they're not particularly new, when you're launching these applications, you should make sure that this is in place.

Guy Podjarny: Right, and it's the security aspect is over permission is like, make sure that people don't get access that they should not have.

Mateo Rojas-Carulla: Yes. When you're thinking about agents, like we'll talk about later. You should just give permissions to the tools that you're using to be, to be the right one.

So if you want read email only, don't give it send access. For example, these are just good habits to have when designing these systems and this is something that is really not new. Now when you're thinking about what is new and there is quite a bit in a nutshell, I think the LLMs are vulnerable because they introduce a really fundamentally new type of vulnerability.

And that is that they really struggle to separate developer instructions from [00:03:00] external input or data, and this allows the attackers to manipulate the behavior of the system via data. So you know let's try to unpack that for a second.

Guy Podjarny: Yeah.

Mateo Rojas-Carulla: If you have a RAG application, for example and you ask a question like, what is the capital of France?

And you would draw a document and that document has an attack in there that wants you to get false information about that. Maybe that question is not the best example, but once the surface false information, for example, the attacker can try to manipulate the model into interpreting the data in this document as new instructions and then executing those to then provide false information to the user.

And I think in a nutshell, the way I like to think about it is that, data or what traditionally has been only data, documents, images, videos, suddenly become executables. And the moment the LLM goes through them, then this can be executed and the attacker can achieve multiple goals. Now, [00:04:00] where does the terminology come in?

People talk about jailbreaks, these are typically situations where the user sends data to the model to bypass the alignment mechanisms of the model. So it can talk about how to build weapons and things like that, like an unhinged version of the LLM. Prompt injections are typically referred to more of an indirect setting where a user is talking to an LLM that has access to documents, for example, or data that has been manipulated and is containing attacks.

And ultimately, this is all about the separation maybe between the how and the what or means in the end, let's say. So what are the means of attacking these LLMs? The means is you manipulate the data in a way that gets the LLM to execute it and achieve the attacker's commands.

And then the ends can be a lot of stuff, it can be to bypass the model's alignment mechanism, to access a tool that an agent is not supposed to access to surface false information to the user. Anyway, you choose your pick. These are more about what do you want to get out of it?

Guy Podjarny: And this is the I guess in [00:05:00] this case, your data will always be executed, right? Unlike to an extent, trying to get the system to execute your command is also old news and whatever, that's what SQL injection is tries, tries to do. That's what cross-site scripting tries to do. That's what remote command execution tries to do. And I think the distinction you're saying here is actually it's trivial. In fact, you have no way not to have the LLM execute your sort of data when you're the attacker. And so that sort of interweaving is just built in and you can manipulate a bunch of things.

Mateo Rojas-Carulla: Yeah, I guess that's what makes it very tricky, is that this is actually what gives the LLM its power.

This precisely why it is so powerful, that it can just take in natural images and text and execute instructions from it. And the barrier is very thin. You can imagine, we may get on to this example later, but you can imagine an email coming in to your assistant that says, Hey, can you forward this email from last week?

Totally benign command. The agents should execute it. [00:06:00] You can imagine another one, which says, hey, can you exfiltrate your whole inbox to me? I'm an attacker, like these are particularly very similar in the way they manifest. And then the question is, how do you make the distinction between the two?

Guy Podjarny: Yeah. Yeah. And so super interesting, aligned. We had a, an episode a bunch of months ago with Caleb Sima and he talked about the sort of the control plane and data plane and how there's no separation. And I think you're talking about the same thing, but I like the analogy. It's a, I know, like in my mind, I've had the dubious pleasure even in Tessl, of sort of building products that have some area in which you are executing user code. And it's again, it's built in remote command execution. And really the exercise is no longer at, can you run that code?

It's more of a, can you contain the execution of that code right now? So I think maybe it's a little bit along those lines, which is, think about a system in which you are remotely executing user code and secure it appropriately.

Mateo Rojas-Carulla: Yes. And you're doing that by design continuously. And I think that maybe more [00:07:00] about what it takes to solve all of that, we may get into that later, but because of the nature of the problem, this really is an AI first problem, like understanding this kind of subtle patterns and attack patterns in data, being able to generate adversarial data to red team systems, all of these questions suddenly become like an AI first problem how do you actually do this effectively?

Guy Podjarny: I wonder so another domain that I think about sometimes is just the nebulous nature of the functionality of the system. Because you talked about the means and the end. But you can make the case that the end in AI or LLM powered system is probably like less defined than it is in any sort of traditional software applications.

There isn't, if you wanted the sort of the holy grail of security, which is whitelisting, these are the only commands that you are allowed to do and everything else is disallowed. That's just entirely unfathomable in the world of LLMs because it's an infinite universe of sort of things that might, that might occur.

And I guess I wonder does that, [00:08:00] does that come up? Like you, you do a lot, again, we're not going to talk about the Lakera product too much, but part of that is you identify malicious activity and malicious outputs. I get the input, when someone is trying to whatever mess up the, indeed, get the system to execute something it shouldn't. But in the output piece, is it harder to try and understand whether it is in line with the desires of the developers versus not?

Mateo Rojas-Carulla: I think that's a great question. And I think people emphasize a lot the means, okay, do identify the means, but at the end of the day, if you want to have a great security experience and you actually want to identify these things you need to look at the output and you need to focus on specific use cases often.

A very canonical use case is someone trying to jailbreak a model to get it to say offensive things that are against the alignment mechanisms of the model. That's what hits the news and it's very popular and people love to jailbreak systems. That's an example where, you know, to identify the vulnerability, you may, of course, try to see if the input has been tampered with, but additionally, you do want [00:09:00] to see if the output contains the kind of language and content that would be against the alignment of the model or the kind of content that the enterprise or the customer wants to use and only by combining these two can you really leverage the powers of, of defense in depth and identify it. And that extends way beyond that. For most use cases, and most even if you go to fully agentic use cases, at any individual moment, the agent is taking data from some tool and making a decision about what tool to call next and with which data and you can identify that and focus on that transaction and try to validate via the input and the output whether that transaction should actually go through or not.

Guy Podjarny: Come through. Yeah. Yeah. So maybe let's hold that thought and come back to this like meta level a little bit towards the end after we get a little bit more specific. But just, we want to talk about newer things including a paper you've written, but for starters, let's see, you rattled off a few of these kind of attacks.

just to make sure everybody is, in mentally in the same place of it. What would you say are the primary types of [00:10:00] attacks? Like we talked about prompt injection and jailbreaking data poisoning?

Yeah. I think I want to say like a word about each of those just a level set us that we're using the same interpretation of these terminologies.

Mateo Rojas-Carulla: So maybe let's say for the sake of defining them, even though definitions vary a little bit, a jailbreak is one where the user is directly trying to manipulate the model that is interacting with to bypass some of its alignment mechanisms. The model was trained with reinforcement learning to never output harmful content for example, you want to provide data inputs that override in some sense, those mechanisms. That's a jailbreak.

Guy Podjarny: That's against the the system prompt or like the, mostly it is against whatever it is that the, the platform builder, the tool builder is trying to contain people to to some lane.

And you're trying to break out of that as the attacker.

Mateo Rojas-Carulla: Yes but it goes even beyond that. So that's the first level at which you want to stay aligned. But most of these models after pre training, they go through something called like RLHF, where [00:11:00] they are trained with reinforcement learning with human feedback.

And the goal there is to, partially to make models that are helpful. And also models that are harmless and so the company needs to define that notion of harm. And so that is deeply encoded via training in the weights of the model, like the weights actually contain that behavior and so the model by design, regardless of system prompt should never say, tell you how to build a bomb.

Guy Podjarny: Right, yeah, good correction. So not just system prompt, but still conceptually, it is the hey, I am offering you. I've got an AI based tool or maybe literally like a touch up. It is not like the tool that is AI and I'm offering it to you, the user and you're trying to undo or get through the protections that might be built into the system. Like maybe a little bit like breaking through the firewall in this case.

Mateo Rojas-Carulla: Absolutely. That's exactly right. Then, at the first level, I find talking to people that this is what they care most about or what they're most aware about. But then we go into the second one, which is prompt injections.

And, for the sake of this [00:12:00] definition, let's say that's more when you have an indirect attacker. So you, as a user, are interacting with an AI system, the AI system has access to data, can be searching the web, it can be searching documents, and the prompt injection is an attack to the model that is contained in those data sources and that as the model reads the data sources gets executed and then affects the user in some way. And one very interesting flip side there is that the, it is the unknowing user that is a victim here as opposed to the jailbreak where the user is the one perpetrating the exploit.

And that's primarily why I personally am way more concerned about these prompt injection attacks. We have talked to a lot of people that have internal AI tools and therefore believe they don't need security because they're internal to their company. But as soon as you have access to any form of untrusted data, then now you can have third party attackers modifying a lot of stuff within your company without your awareness.

Guy Podjarny: So [00:13:00] that's prompt injection.

I'm trying to think about the equivalent to real world. And this one is a bit harder, right? It's like a side channel attack but there isn't really an equivalent in the regular software world to this prompt injection. So before you're breaking through defenses here, it's.

Guy Podjarny: Yeah, it's it's pretty brand new.

Mateo Rojas-Carulla: Yeah, it's like you have someone internally always clicking executables that come in via email, and you just execute it like crazy. And then never ask if you should do that in the first place. That happens inherently via the language model.

Guy Podjarny: Hopefully you don't have one of those in your organization.

Mateo Rojas-Carulla: You never know.

Guy Podjarny: So you've got jailbreaking, you've got

Mateo Rojas-Carulla: Let's say you mentioned data poisoning, again, I think traditionally, I would say when people talk about data poisoning, even though all of these is data poisoning in some sense, we've got a prompt injection to a document, that's data poisoning, but I think people refer to that when they think about fine tuning or training the model.

There have been some examples where, as you fine tune the model, so now you're not just prompting it or giving it data in [00:14:00] context, but you're actually modifying the weights of the network with some of your data any adversarial content that gets ingested, can potentially now provide backdoors to attackers in operation like weeks, months later, and I think what's very difficult about that is that once that data makes it to the weights of the model, it becomes very difficult to identify that this exploit took place. And that is probably one of the most concerning ones out there for sure.

Guy Podjarny: And that's the one really, again, the equivalent I would say is like supply chain security attacks, because it's something that's upstream from the software that you're actually running, that get manipulated, but it's just paired together with a low explainability that you have around AI systems.

And so you don't really know what they do or why they did the thing when they did it correctly. Let alone knowing what they, why they did it when they when they abused something. So if you manage to manipulate it, it's a pretty hidden it's, it's probably harder to execute because you need some sort of volume of data or very specific sort of tuning [00:15:00] access to be able to like pull it off, but if you do, it's a pretty this is the equivalent of a SolarWinds type problem that embeds into your system, right?

Mateo Rojas-Carulla: Yeah. And to your point, the obscurity of it is what makes it very difficult. Once it's in, it's very difficult to identify. And even though it may be more difficult to pull off, these model providers are continuously training their models, pre training on data online.

And, it is clear that data is now highly adversarial. In a lot of parts of the web, and, I'm to the extent that this model providers put the right safeguards in place when ingesting this data, some of these may actually go through and the very big capacity models are able to remember a lot of stuff, even if they saw it very scarcely.

Guy Podjarny: Yeah. Yeah. So I think those are like probably the, the big terms. Maybe we'll move on a little bit to the new stuff. Yeah. So you have beyond the sort of the core product of it, you have this super fun tool or portal, how to call it called Gandalf, which is this sort of fun game in which you basically practice prompt injection, you try to [00:16:00] red team the LLM try to get Gandalf to, to give you the password. I've spent too much time on it. No, I passed it in five minutes. Maybe a little bit more. Super fun, quite addictive, gandalf.lakera.ai, recommend it. And you've turned that into sort of a good data trove of, of understanding how people attack.

And recently released an interesting paper around learnings from it. So with all that tea up a little bit, tell us a little bit about whatever it is that you feel you still need to say about Gandalf but also about the, about this paper and learnings.

Mateo Rojas-Carulla: Yeah, absolutely. Yeah, Gandalf has been really amazing.

And maybe one thing to add is that by now we've had over a couple million people playing and over 60 million attacks to all sorts of LLMs. And that of course gives us very unique insights into, just what kind of techniques do people come up with and how they leverage their creativity to break the systems, regardless of language of location of, all these different things.

We have people from all over the world playing, playing Gandalf. We have people building, automated [00:17:00] attackers playing Gandalf and so because of that, it really allows us to draw a lot of conclusions in terms of where things are and where things are going. And last year we decided that it was time to take that to the next level and actually run some randomized experiments in there to try to make some statements about the state of the field.

And what's easy and what's hard and share them with the community. And that's a paper that we put out there about a month ago and yeah, happy to tell you a little bit about highest level and we can dive into whatever you find.

Guy Podjarny: Yeah. To the specific, so that'll be great. Yeah.

And I love the idea of leveraging, there's the whole startup story around Gandalf coming out and you folks leaning into it and, and making a really turning it, identifying the opportunity and leaning into it. And now experiments being, a later element of it of trying out.

You basically have a bunch of attackers, even though they're doing it for fun, they're actually putting a lot of effort into it trying to do that. So anyway, that's fun. This is not a startup podcast. We'll move on, but the, but yeah, it's also, so in the paper, in the learnings a bunch of those learnings, I'll let you, how do you, how [00:18:00] would you categorize the top learnings from that paper?

Which by the way, we'll put a link to in the show notes.

Mateo Rojas-Carulla: Awesome. Yeah. Yeah. Yeah. Happy to chat about the startup story another day. That's a very fun one. I think at the highest level, one thing that we identified in this paper is that the way that the field is currently thinking about AI security is fundamentally flawed or lacking in quite a few aspects. And, the paper uses data at scale to try to make the point that it's time to take a bit of a broader lens how we think about security, how we evaluate security and the kind of investments that need to be made to work in security and maybe at the highest level, the, maybe one of the biggest learnings or the starting points is the way people evaluate the defense of a system today, one LLM system, is that there are some data sets out there of attacks or prompt injections, whatever, you pass them through the model, [00:19:00] you have some kind of label that you attributed to this data, and then you measure how many did I catch and that's about it, in terms of how people think about security, so the paper tries to challenge that idea or at least the findings of the paper challenge that idea. And at the highest level, we introduce what we call DSEC, which is the Dynamic Security Utility Framework for thinking about security in LLMs and more specifically, about security and at the same time user utility. I think maybe that's one of the most important things is that like anything in security, one thing one can do to defend an LLM is to block every piece of traffic going through it.

Guy Podjarny: Yeah, block it from the internet.

That's the best way to to be secure.

Mateo Rojas-Carulla: Exactly. Unplug everything, close it

Guy Podjarny: You'll be secure all the way to bankruptcy.

Mateo Rojas-Carulla: Yeah, exactly. Yeah. And of course that will work, you will hit, a hundred percent on all those benchmarks, but it will of course fail to to keep users happy and to make a usable product.

And [00:20:00] again, that's not particularly new. I think if you think about a firewall or any other security product, you don't want to block every single piece of traffic. You want to be very selective on that.

Guy Podjarny: False positive, false negative type measure how many but you're like using that lingo right now.

You're talking about how existing detection measurement focus on the sort of true positives on the cases in which it was indeed an attack. And you've caught it. But, and naturally the false positives but that's not enough.

Mateo Rojas-Carulla: That's not enough. Exactly. I think the, the core finding is that while we do have those two aspects and a developer needs to find ways to have trade offs between the two, depending on what matters to them in LLM applications and agents.

The trade off goes way deeper than it has in the past. And so maybe to give you a quick example of what we found in the experiments we run, where we ask people to write in different LLMs with different defenses on them. We separate two types of defenses. One are external defenses that outside of the LLM, take a look at [00:21:00] the payloads that are coming in and out, like we discussed earlier, and try to make an assessment of whether they should block that.

Then if they block it, the player failed at extracting the password from the Gandalf agent. And second type of defense is in the system prompt. In the system prompt, I'm going to tell the LLM to not reveal the password ever. And then you can be very verbose and just tell it to really never, ever do it.

And then we measure how those two compare. And, in in many respects and especially because we didn't play the game of optimizing, the defenses too much.

Guy Podjarny: Yeah, there a limit.

Mateo Rojas-Carulla: Yes, we have limited time there. The system from defenses work quite well often in terms of security. So it turns out that they they often block the attacks.

And by TLDR, by the way, spoiler alert, humans break everything. And so that's an important part of the data that we saw. But the important thing is that the system from defenses work well, but then we looked at other measures of utility. For example, we started to look at the answers that these models give, the ones that are better defended in the system [00:22:00] prompt how do they look like?

And we actually found that they are, for example, way shorter than the reference, so if a unprotected LLM will give you like a detailed answer like this, then maybe the defended one will give you a shorter answer, more to the point. And this is a byproduct of just adding these defenses. It just so happens to affect the behavior of the model so that the, even the content of these responses is quite different from what you were getting from the undefended model. So suddenly you start to see how, because these defenses are so deeply intertwined with the model, you start to see utility hits that are not as trivial as the false positives themselves.

Guy Podjarny: They become functional kind of in the in the sense.

Mateo Rojas-Carulla: Exactly.

Guy Podjarny: In the external one, I think is easy to relate to, for anyone who has that, any sort of security measure, which is just like you would in a web app firewall or even a regular firewall of it,

there's a, there's a leniency that you have to offer because otherwise you're going to start breaking the application. And also the defenses tend to not be as dynamic. If you try to explain the application [00:23:00] to the external defense the application always changes, so especially as we got into the era of DevOps and fast moving applications, it's very hard to consistently have the defense be tuned at your system.

And so it's a dance that nobody really loves, but unfortunately, everybody's familiar with, which is, what do you allow in and what do you disallow when you'll have some attacks that pass through that you hope the system deals with. You're used to that, but the protections of the system up until now tended to be very in line and fine tune it like, they would rarely break the application because they were literally in the code of the applications produced and provided by the very developers that provided the functionality. And so now you're saying like those developers, the tool they have at their disposal is the system prompt. And lo and behold, sometimes when they create the right, the defensive system prompt, they actually break their own functionality.

Mateo Rojas-Carulla: Exactly. And that has an impact on the user experience that they're crafting and one could argue then, okay, then let's prefer external [00:24:00] defenses, which have a lot of advantages, of course. But the kind of extension of that is that when you go to agents. And when the security solution, even external, is verifying every single transaction that goes in the system as the agent makes sense of the world and tries to make decisions, you realize that the security solution is very deeply intertwined in the execution flow of the program, no matter what, even if it goes beyond system prompt.

And so that's where I find it very interesting that any discussion around security for LLMs and the more agentic it gets, the more it should be the case really goes hand in hand with the utility and what is what it means for utility. And there is really no free lunch. And having that discussion or at least bringing that discussion to the public was something that we felt very strongly about after looking at the results in the paper.

Guy Podjarny: Yeah, I think it's a super valuable discussion triggered this conversation here on the podcast. It's interesting because when you think about security when you explain security to a non security person, then you [00:25:00] talk about how security is about the unknown functionality.

So like functional testing is really about, hey, the system is designed to do ABC. Functional testing should make sure that, it performs ABC correctly. When it gets a three, it returns a seven, right? As it should. And security is this infinite world of unknown behavior, of unintended behavior.

And that's why the kind of the holy grail is the whitelist, because if you could really define all the correct behaviors in a very neat box, then you can say anything else is disallowed. But that's just rarely the case. And so you need to define examples of things that are unintended. And it's just like almost unfathomable to think about like the LLMs we've had Des on the Intercom, we had, Tamar here from Glean, we've had all sorts of guests on the show to talk about how in the world of LLMs, it's really hard to know if your product works.

Like even the functional testing piece is hard to define, like everything, your product surprises you, all the time. So how can you ever aspire to, to contain it when you don't even know what the boundaries of its [00:26:00] capabilities are? Like, I don't know, is that, what's your sense at this point? It's probably not fact, but opinion, but is it a kind of a false hope in this domain of it?

Is it a, how do you suggest that people think about, first of all, like what is in reason? And two is given these learnings, like. How should they approach it?

Mateo Rojas-Carulla: Yeah, it's a great question. I think it is very challenging. Especially as you go into agents and tools that are making more and more decisions, then testing becomes harder and harder, of course.

And I think that's, we will talk about agents later, but as a jumping a little bit ahead, I think one of the key challenges there is that the agent will be always making decisions based on data on the fly. And so if you want an analogy for that, it means that the agent is writing software on the go, the code is, the source code is being written as data comes in, there is no like shift left, necessarily on the code level and test it well before deployment you're [00:27:00] writing the code in prod, in some sense, and of course, that begs the question of okay, what do we do here?

And I think that's where I would connect it back to what we discussed earlier, keep it, keeping it simple for now, just inputs and outputs. I would make the claim that a human that is looking at an agent snapshot in time, you have a tool sending you an email and then, you're the agent decides I'm going to call that, the calendar tool right there with this other payload.

A human could freeze that and take a look at the payload can take a look at all the context of the application and everything that you know, the user wants to achieve with this application, maybe it has metadata on who the sender is. I don't know. Just imagine as much metadata, as much context as possible on what this is, the human should be able to make a thumbs up, thumbs down decision on that in a large majority of cases.

And so I think from the way I think this should be addressed and solved is exactly like that. Like [00:28:00] you need to ensure that you have the right kind of live testing on the go that validates behavior in some sense and I think again, that's where security and the usability, the kind of more development level side of the projects become very intertwined in the LLM world, because yes, you can frame it as is an attacker trying to get my system off track, for example, but you can also frame it as.

Even if no one is trying to get it off track, is it going off track nonetheless?

Guy Podjarny: Yeah, even without intending, yeah. People are accidentally wired them some money. They didn't even try to do that. I think, I love that. To an extent, that statement couples both of my sort of recent journeys, right?

Because at Snyk it was all about merging developer and security. Your saying that actually needs to be truly continuous, as in, even in live time. And two is, at Tessl, we talk a lot about how software is adaptable. It's no longer, in general, I think, in AI and LLM world anything static will become dynamic, that would be for whatever the music you listen to or the the [00:29:00] stories you, you read and all of those things will become dynamic and adaptable and adapt to your context.

Software is no different. Like it is, we don't think of it as static, but it is static. We deploy the software and then it remains that software until it's deployed again. No longer the case with LLMs. They create software on the fly. And your security needs to be similarly adaptable which exactly is a cool and interesting and not a little bit scary.

And instead of thinking about how do you, we'll switch to agent, we've been foreshadowing agent security for a while now, like we'll get to that in a sec, but how do you, how would you measure whether you're any good at on the fly security creation? There's a lot of mental layers over here, right?

Mateo Rojas-Carulla: Yeah, I think it's a very good question. I think at the end of the day, you need the right observability and to make sure, especially, the more you go towards agents, you need to be able to keep track of every single decision that's being made. And, retroactively look at that.

Of course, one can imagine rollouts that are, increasing in [00:30:00] scope, there are ways to restrict the systems, right? You can make the system prompt from very narrow. You can make the tool access very narrow, and you can make the action space and then you can learn as you go, what kind of content is going in, what kind of mistakes are being made and so on.

I think that's an obvious one that to me makes sense. But another one, and we see huge demand for this today is just shifting left on the red teaming side, like, how can you have a red teaming playbook that actually adapts to the nature of the systems, and it's not only, executing a bunch of different tests, but it's actually very aggressively trying to get it to go off track in different ways.

And how do you make that highly contextual, meaning that you try to make it go off the rails in the ways that matter for the application, and then how do you then provide insights into the developer of just exactly how the system is going off rails so that it can put stuff in place. And maybe the last one is that, of course, in production in real time, one of the ways to make [00:31:00] things a little bit narrower is to always, especially at the beginning add a human in the loop, for example, when send commands are being used or anything like that, you can just mitigate some of the bad outcomes again, talking about means and ends, right? Ultimately you want to prevent bad actions or inappropriate actions and so you want to add mechanisms for these actions to surface before they're executed.

Guy Podjarny: Yeah. And that's about giving less power to a system that you don't yet fully trust. So I guess let's finally get to that sort of next chapter of our conversation here and talk about agent security. This is all of these problems we've been talking about so far.

True even if you're like a relatively naive user of the LLMs, but they get messier and messier as you get bigger. And agents definitely are, have higher aspirations around what they achieve. So maybe for starters, again, let's start with terminology a little bit because agent is a very overloaded term.

So define for us, like what do you mean when you think about an agent?

Mateo Rojas-Carulla: There are probably as many definitions as there are people in the world. [00:32:00] So let's give it a go. I think to me, the way I think about a fully agentic system is a system that's where an LLM automatically determines the program's execution flow based on user input and available data sources and tools.

Let's unpack that for a second. I think the most important thing there is that it autonomously determines the execution flow of the program. If you want to give an example of, say, a ChatGPT like RAG application, you say, hey, answer this question, and you get an answer, right? When you go to the world of more agentic systems, then you say, manage my inbox, and then the system is going to say, okay, I'll come back to you when I'm done, right?

And then he's going to decide okay, I'm going to just ping the inbox. I'm going to look at the email. I'm going to take an action based on that. And eventually it will decide when to stop and he will tell you, I'm done. I did all these 20 things. And so the important thing there is that the agent was fully in control of what tools [00:33:00] it was calling and what decisions it was making with the data it was getting from the tools. And again that's a little bit of a ideal definition. Of course, in practice, you have all sorts of hybrids, right? Like you have a lot of LLM based workflows where actually it's like agentic in the sense that you have different tools being called different stages, but the tool is, the LLM is not in control of those flows.

It's like the flow is always the same and the LLM is performing a predetermined action at each stage. All of those have their challenges. It starts from conversational all the way to fully agentic, but I think maybe my opinion is that the jump from LLM workflows to fully agentic, it's like much larger than conversational to workflows or almost like traditional software to LLMs.

Just because that is what introduces that novel fact that you're no longer in control of the steps of what's happening. You're really giving control to a machine that decides what to do [00:34:00] and when to stop. And I think from a security and utility and usability perspective this introduces all sorts of novel challenges.

Guy Podjarny: Yeah. And I think I like that definition. I think if I echo it back, I'm saying, at the meta level, it's just a scope of task. So that's one, an agent loosely said is every time you're trying to do something bigger, you're probably need an agent because there's more steps to that.

I guess you mentioned in passing is the tool use element of it, which and maybe we come back a little bit to say, is it or isn't agentic if it's just inline tool use? And maybe that's a bit of a moot conversation, but, directionally more tool use. So you're giving it more power.

So you've asked to do something bigger, you've given it more power. And then you're pointing out that you don't know the path it would pursue to be able to accomplish that. And so back to our comment on what is intentional or like intended behaviors and unintended behavior, there might be 10,000 different ways that are all legitimate ways for it to try and achieve the results.

And in fact, it [00:35:00] doesn't even plan them upfront. It sort of produces them on the fly. And seperating those from the 10,001 behavior that is unintended, that is actually a malicious behavior. That's a beast. That's a hard thing to detect.

Mateo Rojas-Carulla: And you know what? One thing I would add there, that's 100 percent right. I'll say two things to just to complement that one is that's maybe the biggest discussion point I've I've felt in the AI security community and beyond, which is, will we ever give such kind of that kind of agency to a software program to do all these things?

Or can we just keep it like a narrow system? Like we just, keep the action space narrow. We don't give right access and so on. Why would we even open up that gate? I think that's a fair question, but I think first of all, we already have seen like fully agentic systems.

I don't know if you've followed Open AI's recent deep research, for example.

Guy Podjarny: I've made a very good use of it to recently to discover thoroughly research and find the top [00:36:00] 100 best dad jokes out there and sort out the 50 that are least known within them and rank them by humor.

It took an eight minutes and it's the first time that I actually managed to get AI to, it doesn't even create, but find new and pretty good dad jokes as dad jokes go, I'm a fan and doing it. Very much applied to the best use of the state of the art AI at the moment.

Yeah sorry, a little bit of a detour on it,

Mateo Rojas-Carulla: No, but please send them to me because like I can make huge use of this.

Guy Podjarny: And I was so happy about it that I posted about it on LinkedIn.

Mateo Rojas-Carulla: So I think what I'm going to do immediately after the call

Guy Podjarny: But yes, sorry, back to definitely deep research is substantial.

Mateo Rojas-Carulla: Yeah, it's substantial because basically, it's trained to end to access all these tools and then the system is going to go look for data sources and based on what it finds, it's going to look for other data, and it's going to do that until it believes it's done. It has a Python interpreter to plot graphs,

it can just really do so much. And it's not taking action per se, it's just ultimately giving a report. But you can start to see how these tools [00:37:00] are becoming a reality, and second, there's a lot of talk about how agents are going to transform the economy eventually.

If you can imagine, you work at Tessl, right? And I don't know, any years ahead. Probably not so many where software is just written end to end for very complex tasks that would take humans today, like a very long time, in very short times. The kind of productivity gains that you get from this are just insane.

It's hard for me to imagine we're gonna say no to that. If the technology gets good enough, I was listening to Ezra Klein on the New York Times podcast, talking about this deep research. And he said something that I think is going to be more and more true across the economy, which is that he has worked with the best analysts at the New York Times for 20 years or something.

And this tool can produce in minutes what the median analyst does in days or weeks. And to me, anyway that to me is very representative of why we do want and we will want and we will give access to these tools and [00:38:00] that truly agentic systems are not a if, more of a when.

Guy Podjarny: So the so I think, it's a good definition of agents.

And I think myself and probably a good number of listeners over here are fans in the sense of excited by the potential of agents. And then from here, we'll talk about, well what are the limitations? What are the things that would block us from doing it? I think oftentimes the conversations are about what can it not do in terms of intended behavior, if I stick to that terminology, but I guess in security, our concern is the opposite is like, when is it that you can trust that it hasn't gone off the rails. So maybe, maybe let's get back a bit to the meta in a sec, but before let's talk concretely. So if you're building an agent, what are the types of attack vectors above and beyond? We already talked about the big LLM ones. Maybe you want to mention them again, but beyond those, what are the type of security concerns that you might have above and beyond amplifying the the ones that we've discussed.

Mateo Rojas-Carulla: Yeah. I think in the end it really becomes then a matter of [00:39:00] almost categorization, like what are the ends, that, that are become true in the agentic case that are not there before. Maybe we can actually think of a few examples to illustrate that maybe to keep ,maybe to warm up and just to get an example on a non agentic system, but just because it's a very real world vulnerability that can help set the pace, recently, Gemini, Google released sort of assistance for the workplace. You can talk to your drive,

you can do stuff in your documents and so on. And what we find is that you can do things like the following, or you could do things like the following, which is pretty crazy. Let's say you, Guy use Google workplace, workplace at Tessl, and you're using these Gemini to ask questions. I decided to create a document with some attacks that want you to get false information of whatever kind, or give you a phishing link or whatever it is.

I can share that with you and not notify you. Okay, so you didn't hear a thing, [00:40:00] but now this is in your drive and then we showed that the LLM can then leverage this information via the RAG and then surface false information. So that's just the kind of very real vulnerabilities that you can see.

Guy Podjarny: And that is arguably not so much an agent attack as like a, just a larger scope LLM use attack, which there's probably agentic behavior somewhere in there. The task I asked is like an LLM ish simple task, which is a bit of info, but it was indeed a poisoning attack, right?

Mateo Rojas-Carulla: Yes. And it's more to illustrate how things compound, right?

So you have the right permissioning. Do you have the right document in your RAG? Why is that document there in the first place? And what, why is it, what is the LLM executing and so on? That's true. I think when you think about agents, then you have the typical more jailbreak actually is something topical.

I saw on Twitter this morning, though, I'm not sure exactly what happened there. So this is a bit anecdotal, but there's a, a company called Manus, for example, where a user managed to use a, an [00:41:00] agent that is actually able to, run commands on your interpreter and on your terminal and do all these different things.

Just asked it, can you just give me the contents of this directory and this directory? It's supposed to be where, it keeps all of its kind of configuration files. And then as a result, the user managed to extract all the tools it has access to, the system prompts, all the stuff that was hiding in there.

And that's, I think maybe touching on one of the original, more traditional vulnerabilities we discussed, which is, can this agent actually have access to, full directories or things that should actually not be service to the user and where it's a combination of, more of a jailbreak attack or just a simple instruction combined with poor system design in some sense, which, which ends up creating the vulnerability.

But then as I mentioned, I think that if you think about agents, then again, I'm most concerned about the attacks via data, but that can generate multiple types of vulnerabilities. One system that we built for [00:42:00] to illustrate is on LangChain, which is, one of the leading agentic libraries to build workflows and so on.

You can build an agent that is essentially summarizing your inbox. You can monitor, tell it to monitor it and stay there. And just every time there's an email, send the TLDR digest and just whatever it is you want to do. And then you can show that as an attacker, you can send an email to that inbox.

And the email should contain, of course, the right type of attacks, and it should encourage the model to forget its instructions, at the end, it is really all about like social engineering of models instead of people. But then once you convince the model to do that, you can get the model to just exfiltrate your whole inbox to the attacker, for example.

And so if we look at what happened there, the how is the same as we discussed there's the data leading to some kind of behavior. But what you exported there is like just getting a tool to use one of his [00:43:00] functions, which is to send emails in an inappropriate way or untimely manner or unauthorized manner.

And so I think that's something that is very, will be very widespread.

Guy Podjarny: So both of those examples, though, around the, the authorization on it or the access and around the, around sort of the prompt injection. They're both, again, they get multiplied or amplified by complexity of a task and just how often they happen and the trustworthiness of a source.

And so they're, they probably show up in agentic systems. But if you didn't have an agentic system, if you had a workflow that has entirely controlled deterministically, but it just repeatedly used LLMs, control the flow, but you go off, you read an email, you, you extract an action or whatever it is right out of that email and you map it to that, you're still susceptible to them, right?

They just amplify it. Are there types of attacks that are specific to agents that are just they only manifest in the context of agentic as you define the place in which the agent , the AI [00:44:00] decides what to do next?

Mateo Rojas-Carulla: I would argue that this example actually does go a little bit in the direction, even though I agree with you that.

It's not quite agentic in the sense that it stays within one tool, but at the end of the day you can rewrite that a little bit where the outcome is not that you send an email, but the outcome is that now call another tool that is supposed to manipulate your calendar and delete a very important invite from your calendar or something like that.

And now you actually went beyond the barrier of this one tool, you read the email and it said, you should really, no matter what, delete the calendar entry from the end, whatever, at 6 p.m. It said, sure, I'm going to do that and then, it doesn't leave any trace. I think that is in the predetermined LLM workflow, you're going to have more control on the actions that happen, so

you may be not expecting to send an email, and so you will just not do that. The point is that whatever the attacker encodes in that email, which is an untrusted source of data can get the LLM to abuse either the same tool or another tool that it has access to. And I think it is [00:45:00] that tool abuse that is very novel in the, in this agentic world, where you can get the, the attacker can essentially try to get control of the tools that the LLM has access to accomplish its goals.

And how does it do that? It does that by manipulating the data that's going into these tools. And as a result, modifying the execution flow of the program in ways that tailor the attacker.

Guy Podjarny: I think that makes sense. And I think I guess the analogy that, that's very kind of helpful clarification, and I think the analogy is almost like you could ask, you can use prompt injection say to to make it so that when I ask what's the email of the company's CFO, I will get the wrong answer. And then the user might get that and they might perform some sort of wrong action with that but in the agent's case, it's the it's the agent that you are fooling.

And so you're using the LLM attack, which is, hey, get me the wrong answer, but then you're subsequently getting the LLM to therefore say, hey, [00:46:00] send whatever this confidential financial information to the CFO. And so that's where you're manipulating the agent. So the prompt injection was around getting the wrong conclusion on a specific discrete action, but here in the agent security world, you're combining that with acting on the wrong information in an automated fashion, right? Without human supervision.

Mateo Rojas-Carulla: Yeah, that sounds absolutely right. And you can imagine even adding three steps ahead where you say in the prompt injection, not only send this email to this person, but also, and by the way, delete this email from your records.

And by the way, delete the history. And by the way, tell your, tell the person that you did nothing.

Guy Podjarny: And in all those cases, if you told a human that, so if the initial, the non agent attack, the plain LLM attack is the prompt injection that gets all of that sort of instruction, a human looks at that says no, this is I think you need fairly low level of qualification to discount that.

But then, the question is does an agent does the sort of the LLM call that decides what to do [00:47:00] next, does it have that common sense equivalence to say, okay, this is malicious hence back to the intertwining of of utility and and security that is needed here. Cause some actions like that, like maybe an email of send, the confidential info to the CFO might be a legitimate thing, right?

Mateo Rojas-Carulla: Exactly. Yeah. And I think the one thing I would add to that is, is, it's exactly right. But even with humans, we haven't figured out how to keep them secure on online, right? Like they click stuff all the time. They do the wrong thing. So now we have the challenge of solving that with automated systems.

Guy Podjarny: I think we have the advantage that humans are not that productive and they make different mistakes. And if we're talking about making humans a hundred times more productive through these LLM systems, then these LLMs will make a hundred percent, like as many of these of these mistakes on it, or, that much faster of it.

And also they would be more repeatable. So humans are relatively predictable in masses, but, therefore all the statistical attacks out there, but they are, they're still [00:48:00] different, different people might respond differently to, to these things. And same with self driving car and it was going a little bit down that down or a rabbit hole.

But I think that's, I think that's a useful though, delineation for the, not just the scope of the, the attacks but rather the acting on the, the manipulation or whatever it is that that you got the LLM to propose. And I guess maybe jumping to the other direction, then we're all doomed.

So how would one respond to that? So I'm building an agentic system. I want to tap into the sort of holy grail of potential, but I also don't want it, sending confidential information away and deleting the email, what's a, at least directionally good course of action for me to do?

Mateo Rojas-Carulla: Yeah, I think as a an application builder, you want to be very diligent with, first of all, red teaming that system effectively, we've actually worked with some of the leading agent builders out there as red teaming partners, and we have found that it's very surprising what you can found and what you can achieve via [00:49:00] these types of attacks.

And they're often very surprising to, the people building the systems. And so I think it is a time where it pays off to just really invest in that in that pre deployment side and understanding what's there.

Guy Podjarny: Start by digging your head out of the sand and run some red teams and appreciate the risk here and find some flaws.

Mateo Rojas-Carulla: Yeah, especially because, while there will be the assistant that you can tell, organize my kids party and it can make calls and it can write emails, you can do all this stuff, there's a future, probably not too far away when that will happen, but we're not there today.

So agents are going to be rolled out in phases and we're not going to give full actionability to them on day one. That's maybe the first advice, don't just unleash an agent that can do, access your bank account on day one, then make transfers. Then one can have just a very incremental but fast and rapid rollout where, you know, you, you do try to narrow down, the core functionality and then you expand access and you expand things from there and [00:50:00] you add the right, runtime defense as part of that to, like we said earlier, there's a huge, look at the transactions happening there and try to validate them, and try to understand whether you can actually, you should actually process that data.

And you should actually make the decision that you're about to make in the context that you're at, and then, slowly extend from there. I think one thing we didn't get to discuss on the Gandalf paper, but I mentioned it now is that a clear finding of the paper is that more narrow applications definitely help security, so we found that the more you restrict just even via system prompt, the application scope the more you can resist attacks. And so it is a certainty that just having a rollout that follows a bit that the principle, let's get to the fully agentic thing, but let's, give access as needed as we go. I think that can be a very responsible way forward.

Guy Podjarny: Yeah, and that's very aligned with what we've been talking about so far, which is if you constrain the ends, if you make the intended behaviors [00:51:00] world of your application a bit more manageable, then you're better able to block off the unintended consequences. The broader the sort of sphere of possibilities that is included in the legitimate behavior of your application, the harder it is to say what is illegitimate.

And do something about it, so

Mateo Rojas-Carulla: We'll get there very fast.

Guy Podjarny: Yeah, you're right. Then, and I think so, so one of my big takeaways from this conversation is just to is all of that sort of relationship between utility and security and it, while it seems obvious when you say it,

I think it's often forgotten that if you want to secure a system, you have to be able to define what it is that the system does because you have to contain it into that world. And if you don't have that definition well captured, then you're going to struggle to secure it. And similarly, when you, when your tools today per your paper.

And your tools today [00:52:00] to contain its security, are the, so the system prompt and you're interweaving that with functionality that will like it or not unintentionally constrain the possibilities of your application. And so I think the key guidance probably to people is be intentional. Don't think about these two things as I have a dev team that is building my application and the engineering team that's testing their utility.

And then I have someone later on, I'm just going to slap on some, some text to my system prompt and and make it secure. You have to think about them together because those two worlds are the same. So that sounds like a good, that's at least like what I'm understanding a lot from all this conversation, which has a lot more sort of specific details is just these things have to come together.

Mateo Rojas-Carulla: 100%. I couldn't have said it better myself.

Guy Podjarny: Very cool. I think Mateo, thanks for sharing all of this sort of great insights on it, the definitions of it, I've really enjoyed the conversation on it. The Gandalf paper is great, it's actually like relatively short, and easy to understand.

So we'll put a link to it in the show notes and I recommend to folks to try it out [00:53:00] and also to try out Gandalf if you, if you haven't yet, that would also give you like a visceral kind of appreciation for it. I often say that security is boring, but hacking is fun. And trying it out as a realization moment, that you probably experienced.

Mateo Rojas-Carulla: Yup, I think that's absolutely right. And that's why it is actually used by many companies as like training around the security because it gives people a very deep feeling for what it's about. And I think that's something that very few other methods can achieve.

Yeah, really encourage the audience to go play and if they can beat level eight, they should let me know.

Guy Podjarny: Perfect. Cool. Thanks Mateo again for coming onto the, onto the show here.

Mateo Rojas-Carulla: Awesome. It was a lot of fun

Guy Podjarny: And thanks everybody for tuning and I hope you join us for the next one.

‹ Introducing the 4 AI Native Dev Patterns – with Patrick Debois

LLMs in Social Deception Games ›

Subscribe to our podcasts here

Welcome to the AI Native Dev Podcast, hosted by Guy Podjarny and Simon Maple. If you're a developer or dev leader, join us as we explore and help shape the future of software development in the AI era.