THE END OF

LEGACY APPS?

Birgitta Böckeler

Distinguished Engineer, ThoughtWorks

Back to podcasts

Is Your Team Ready for AI-Driven Modernization?

2 Sept 2025with Birgitta Böckeler

Also available on

Transcript

[00:00:00] Simon: Hello and welcome to another episode of the AI Native Dev. Uh, my name's Simon Maple. I'm your host for the day and joining me today, uh, we have Beta Bela, who is a distinguished engineer at ThoughtWorks. And today we are gonna be talking, uh, all about how, uh, AI can be used, not just, uh, with. You know, new features that we're building are in applications.

[00:00:29] Simon: But as well as that, looking more into the legacy applications and how we, uh, can unlock, uh, forward engineering with, uh, code that has potentially lost to source, code lost the engineers that actually originally worked on that. All these classic problems that we, that we have, uh, with Legacy code, uh, Beita, welcome to the, uh, native Dev.

[00:00:50] Simon: How are you?

[00:00:52] Guest: Hi Simon. Thanks for having me, and I'm well, thank you.

[00:00:54] Simon: Absolute pleasure. And so you're a distinguished engineer at, at ThoughtWorks and, and you've been an expert over the last few [00:01:00] years on, uh, how teams, uh, use AI or essentially exploiting AI in software teams, how they can get the most of it.

[00:01:07] Simon: You also, from what I understand, uh, did some work on the, on the tech radar. You're one of the people who work on the tech radar, which is a super cool, um, way that people can really look at tools and understand. Which levels of maturity they're at and, and whether or not teams should, you know, assess them, be adopting them, those kind of things.

[00:01:27] Simon: Um, talk us through a little bit about, about kinda like your role and how much fun it is to, or not fun it is to, to put that, that that tech radar together.

[00:01:37] Guest: Yeah. So, um, ThoughtWorks is a global software consultancy, right? And also in my professional, like my full-time 20 plus years professional career, I've only worked in software consulting.

[00:01:48] Guest: So I've seen like lots of different organizations, uh, build software and was part of, uh, a lot of organizations building software. Um, always like hands on, you know, I'm not a PowerPoint consultant. [00:02:00] Um, and um, so at ThoughtWorks. I've been there now for almost 13 years, but the last two years I've had a full-time role to just like.

[00:02:08] Guest: You know, dive into the fire hose of generative AI and large language models and how to use them for software teams. And it's been, uh, a lot of fun, but also quite intense. And, you know, it's very interesting to be in a pressure cooker like this in the hype, right. And try to stay up to date. And I mean, yeah.

[00:02:24] Guest: One thing that can keep me a little bit grounded is the work on the technology radar as well, which is like, we publish it twice a year in ThoughtWorks, just a snapshot of what we see at our clients in the moment, right? Like as kind of like, yeah. Compared to things like, you know, big market reports, it's just a snapshot of what we see, which can be useful, you know?

[00:02:43] Guest: Mm-hmm. And, uh, actually in the last edition we had over 50% of the entries. We usually put around a hundred, 110 entries of technologies and techniques that we're seeing on it. Over 50% of the entries were generative AI related. Wow. So, and, and did one of them, did one of them make it into the adopt this [00:03:00] year,

[00:03:02] Simon: or is it, I don't think so.

[00:03:03] Simon: I think maybe.

[00:03:04] Guest: Yeah, no know what, you know, what we have in adopt is actually retrieve augmented generation.

[00:03:09] Simon: Ah, okay.

[00:03:10] Guest: Because, you know, when generative AI first started, it started in assess because it was like a new, uh, technique. Right. But it's, it's also like high level and abstract enough that even at this point, like still relatively early in the, in the technology, right?

[00:03:24] Guest: It's something that can go into adopt because it's just like a, a general technique that everybody's using right now.

[00:03:30] Simon: Absolutely. But that's the

[00:03:31] Guest: only thing. No tool. Yeah.

[00:03:33] Simon: Yeah, yeah. Absolutely. Let's, let's talk a little bit about, uh, I guess how we have today, um, worked with AI agents in our existing workflows.

[00:03:43] Simon: Um, I guess I'm here, I'm talking about active projects and how we build, uh, with agents in, in that, in that kind of like, uh, typical software development flow. Then we'll lean into. More about legacy, uh, applications, legacy modernization. How do you see, [00:04:00] you know, you've obviously got a lot of experience from, from many different, uh, customers that you work with.

[00:04:04] Simon: Um, what, how do you see people using ai, uh, effectively in their day-to-day workflows today?

[00:04:13] Guest: Yeah. Yeah. I think before I answer that question, maybe I should also emphasize once more because it maybe, it didn't, uh, get across quite yet that my focus is really on. Using AI for building software to improve how you build software, right?

[00:04:26] Guest: Like not using AI in products, right? Like putting, building it into your products. And it's always a little bit, uh, clunky to describe that difference, but I think it's a very important difference to make because using AI for how we build software affects everybody in software development. Everybody involved, right?

[00:04:43] Guest: Where it's like putting AI into your products. Only affects the subset of people that actually work on a product like that. Right? So I see myself as a domain expert in effective software delivery, and I try to apply AI to this domain, right? So that's kind of a, just to focus it in on a little bit, right?

[00:04:58] Guest: So it, it's

[00:04:58] Simon: such a subtle difference, but [00:05:00] really important. Yeah. So at the AI native dev, that's what we can, that's what we care about. We love Exactly. We, we, we always think about how you use AI in your. In your workflow. If it goes to production, then it's, then it's not the workflow, it's actually part, it's the ingredients of your app.

[00:05:14] Simon: Yeah. If it, if it, if it does go to production, but it's actually helping the way you actually build and, and deliver to production, then it's all about the tools that you use in the workflow. It's a subtle difference, but very important. Yeah. Yeah. Yeah.

[00:05:24] Guest: Yeah, and there's of course overlaps, right? If you think of an, a big organization that has products and they want to build AI into it, and like for whatever their context is, that it's actually worth for them to self-host models, to have guardrails.

[00:05:36] Guest: To have evals, right? You might also need those to build your tools for. You know, using it in the software process. So there's overlap, but I think it's a, it's a good distinction to understand because mm-hmm. Um, it's, uh, it really helps focus sometimes and yeah, so in that space, when we talk about agents, I would say really, um.

[00:05:57] Guest: That we have really agentic tools, agentic [00:06:00] agents, big words that often mean very different things, right? But it only really started, let's say, exploding around February, March, maybe I would say. Right? So a lot of the, um, coding assistance started releasing more agentic tooling support in autumn last year, I would say October, November, there were some open source tools that already were doing it quite early, like earlier last year.

[00:06:24] Guest: But you know, the. Cursor Windsurf, GitHub copilot kind of started end of last year, and then it started getting into everybody's consciousness with the vibe, coding meme. Right? Like the, the Carpathy tweet was actually first week of February. Can you imagine? It was six months ago? Yeah. Yeah. Which year? It feels like about five years ago.

[00:06:43] Guest: Right? Yeah. Um, so that's when like this agentic idea, right? Using that more in, in coding became more, um. Of the, the step change, I would say, right? Mm-hmm. And what it means in coding assistance is basically that the, the model, [00:07:00] um, gets access to even more tools than it had before, right? So previously it was basically in the coding assistance, it could edit files.

[00:07:07] Guest: Um. Yeah, I think it was mostly editing files and often just one file at the, at a time. Uh, and then now what we have is it can edit multiple files. It can, uh, access the terminal, most importantly, quite autonomously, and execute commands there, execute your tests and so on, and immediately react to the, um, responses that you get from it.

[00:07:27] Guest: Right? So it really expanded the size of the. Task that, uh, a coding assistant can work on. Mm-hmm. And so either in like a session with a developer going kind of like back and forth, or then also a little bit later, more of these products started releasing autonomous. Had agents that you can run in the background, right?

[00:07:47] Guest: So that was open AI Codex or Cursors started introducing these background agents. Um, there's, you know, very early, Devon of course was one of the first ones that did that. So, and it's becoming more and more common and more and more products [00:08:00] now. So where you actually send it off by itself, right? Without intervening in between.

[00:08:04] Guest: So I think those are also two, that's also a very important distinction, the autonomous ones and like. Working with it myself in the, in the workflow and like as a developer working with it, I think that's the much more common one as well at the moment. So, and then when you combine that with MCP servers, they become even more powerful.

[00:08:21] Guest: They have even more tools available. They can access my test database or they can browse the, a website in the browser, and so on and so on. Right. So, and with this, uh, expansion in the size of the task that we're working on, um. Everything kind of get becomes different again. Like previously I was much more thinking of it, like I kind of go maybe function by function, line by line.

[00:08:43] Guest: And AI is there to help me be faster. Mm-hmm. But now AI is doing so much stuff that I actually often fall behind. Right. So now there's like all of these much more sophisticated new processes and ways of working that we have to come up with as developers to even come up and, uh. [00:09:00] Make sure that something sensible comes out of this.

[00:09:03] Simon: Yeah, absolutely. And we've actually essentially become the managers really of those, of those processes. Um, Patrick, Patrick de Boar wrote a really interesting piece in around paralyzation of these mm-hmm. Agent processes as well, particularly those that are kind of like the more asynchronous that you, you fire five of these asynchronous.

[00:09:20] Simon: Um, yeah. Um, you know, agentic flows off and then, then they come back and you can effectively review. Maybe they're doing different things, maybe they're doing the same thing and you wanna, you wanna pick through which one, which one you actually wanna take on. Um, but yeah, it's really interesting the, the new role that we're taking on, which we'll talk about a little bit later.

[00:09:39] Simon: Um, so, so I guess in, in terms of, you know. When, when we, when we have agents, um, writing a bunch of code for us, um, it's very hard for us, I guess to feel. Almost like connected with that code in terms of intimately knowing how it's been built. Because we tend to click accept, [00:10:00] accept, accept, and, and we, and we don't tend to necessarily give it the, the, the level of granularity of, of, of checking as we, as we want to.

[00:10:07] Simon: Um, it kind of almost immediately turns into legacy code, right? Because when we think about, you know, I always say code that was written a day ago is legacy code. It, it's any, any code really that you don't funda, that you aren't fundamentally. Actively changing, in my opinion, can kind of be considered legacy because you have, there are, there are tasks you need to do to actually truly understand what is this code doing?

[00:10:29] Simon: Particularly if someone else wrote it. What is, what is this code doing? What is it, you know, what, what, what was in the developer's head when, when we first, when we first kind of look at this and when we think about true legacy code where, you know, you may not even have access to the source still, but you have this.

[00:10:47] Simon: You have some application that's kinda like running. We might wanna migrate frameworks, we might wanna change the platform it's running on. How do we, these are massively different challenges that [00:11:00] AI has to, has to kind of tackle. How would you say, do we use the same, should we use the same tools to, to, to.

[00:11:08] Simon: To make changes or to even understand what legacy, um, code is doing compared to forward development or, or should we look at a completely different set of ag gentech processes as well as tools?

[00:11:21] Guest: Yeah. Yeah. Legacy migration modernization I think is a great example of, um. Like, uh, almost like, uh, initiatives and larger projects that we can run where we can really think through how do we use agents or how do we use AI in general?

[00:11:35] Guest: Right. And, uh, in comparison to, you know, just the day-to-day usage of a coding agent, right? Where we have all of these. Typical practices now, or we plan, we break down and so on, right? Um, but there are like opportunities when we have larger initiatives on our teams, like a legacy migration, which is, I think the best example, right?

[00:11:54] Guest: It can also be other workflows, right? Where we can actually, uh, more sit down and think [00:12:00] about, okay. What is a workflow here? Like what, uh, do we wanna write prompts that we can reuse again and again because we will have to do this for 50 components or, um, so I think that's where those two areas we talked about before start overlapping again as well, right?

[00:12:14] Guest: So now you actually wanna build Angen system that helps you with your initiative. Right. So now you have to know more about like, how do you build, build an agent yourself, or what tools do you have available? Right. And so I've recently worked, um, on a, uh, on a few things related to, uh, this legacy migration.

[00:12:31] Guest: It's as if we prepared this conversation, so right. To get to this topic. So, um, like the, the, the first thing was that, um. So we, we did indeed have a client, like you were saying, who said, oh, we actually don't have the source code for this application anymore. Mm-hmm. So, you know, imagine like a bad vendor breakup or.

[00:12:53] Guest: What, and it's a, apparently not that uncommon. I've since found out and they said, okay, we haven't been able to upgrade this [00:13:00] application in quite a while. We, we need to do something. Can AI help us? Right. Even it's like a,

[00:13:05] Simon: even if it's a security concern or like, yeah. S there are vulnerabilities that are.

[00:13:09] Simon: The, you know, absolutely all over the place that have been found. It could be just general maintenance. Yeah.

[00:13:14] Guest: Huge operational risk. Also, they couldn't upgrade the database because like the newer version of the database had some like major changes and dialect and so on. They couldn't change the SQR queries that were in the code.

[00:13:25] Guest: Right. Wow. Um, and so, um, what. Uh, we have seen now time and again, uh, in areas where you do legacy migration. Also with those of our clients who have large cobol, uh, code bases still is that, um, usually you don't go from, you don't feed. The large language model with the code, uh, just the legacy code and then turn it into new code in one step.

[00:13:51] Guest: You don't do that anyway, right? It's, uh, because just too far away from each other, right? So, but usually what you do is you look at how do we usually get from A to Z, [00:14:00] right? From the COBAL to the Java, and how can we use AI on each of those steps on the way there, right? Um, and so we took it the same way here.

[00:14:09] Guest: So we looked at first, how can we reverse engineer. In the sense of like, how can we create a comprehensive description of what this application does right now, and then when we have that description, how can we use that to do the forward engineering? Right.[00:15:00]

[00:15:10] Guest: Not just with ai, we can look at dynamic data sources as well. So data that we capture from the running application. Uh, so there's some things that have been around for quite a long time that are typical data capture approaches that we do in legacy migrations. Also, before ai, like, uh, change data capture, right?

[00:15:28] Guest: So in one of our experiments we put, uh, just like triggers on the database tables. Had, you know, every update and insert was kind of recorded into an audit table. Um, and so that we could just see like in the running application what happens or maybe if you have an event based system, you could like capture events in the running application or, um, what you can now also do with AI is you can have AI browse through the application, especially when it's a web application.

[00:15:57] Guest: Doesn't always work as easily as with a web [00:16:00] application. Right. And can have. Ai, describe what it sees as it navigates through the application, right? And have AI explore all the different click paths, and then you can tell it every time you click. After that, please check our audit, log in the database and see what changed in the database when you click that button, right?

[00:16:19] Guest: So you can try and create all of these, uh, uh, descriptions of what is happening. It's almost like

[00:16:26] Simon: it, it sounds like it's kind of like a. Like a forensic exercise really, isn't it? Right, exactly. Yeah, yeah, yeah. It's it's incredible how, like, when, when you're, I guess first of all, I, I have a question in terms of the tools, like, are there tools, specific tools that you are using for this?

[00:16:40] Simon: Or is this something that you would expect, needs to be, needs to be handcrafted? Um, and I, and I guess my, my second question from that is. When you, when you gather all this data, how do you, how do you even think about which context is most relevant, uh, so that when you actually [00:17:00] do the analysis, you're not overwhelming the LLM with that huge amount of, of context, but you're providing it with what it needs to make a good decision.

[00:17:07] Simon: But I'd love to start with a tool of one. How would someone even think about doing this?

[00:17:12] Guest: Yeah. Yeah. So I think, uh, we found that actually you don't need like a fancy agent platform to get started with this and do something there. And my suspicion is also that because of the nature of the types of system where you have to do all this forensics and that are really old, right?

[00:17:28] Guest: That it's probably going to be different every time. Right. Just if you think about, I don't know, SQL or even Cobalt or stuff like that, there's so many dialects there as well, right? That's already where it starts.

[00:17:48] Guest: When it comes to Cobalt, for example, we have a, um. An accelerator that we bring to our clients in, in ThoughtWorks that we customize, uh, every single time we use this for our clients. So where we [00:18:00] say, okay, do you need, do you have cobalt code? Do you have C code? Do you have, uh, you know, a particular dialect of some other language?

[00:18:08] Guest: Right? And we load that into a knowledge graph to get like a much richer, uh, yeah. Data set about what's happening. And the, um, advantage of this customization is not just that you. Might have different languages, different dialects, and so on. But also you can decide based on the goal you have. What else?

[00:18:27] Guest: What do you want to load into the knowledge graph, right? So just imagine, uh, if you enrich the knowledge graph with functional descriptions of the code, uh, by an LLM, right? Then you can actually, when you ask the knowledge graph about things like what is happening in the code base, what validations are we using in X, Y, z?

[00:18:47] Guest: If you get more functional answers back. You can also provide this to a lot more users, right? 'cause there are users who don't wanna know what's the name of the function or the module or, but they just wanna know what's happening in the functionality, right? [00:19:00] Or you can find ways to maybe explore what are the different capabilities here to find the seams to break things apart, right?

[00:19:08] Guest: So that would be a more customized approach where maybe you, you know, you have really large code bases or you have a specific goal or something. But to get back to that idea, to that. Uh, example I was talking about before where you have lost the backend code, but maybe there's some other stuff, right? So what we did there is just like work with an existing coding assistant, right?

[00:19:31] Guest: So, uh, and just then point the coding assistant at the things that we have available, um, and, um. Also use a few MCP servers. So for the web browsing, you can use something like a play playwright MCP server where that also like knows the DOM structure and doesn't just go off of visuals. Right. And then I also usually like to just create small MCP servers on the fly because they're really easy to create actually.

[00:19:58] Guest: Especially if it's like, [00:20:00] it's like a little tool that you might throw away later and you don't need it anymore. Right. So that's what we did, for example, for this data capture thing. So. Where the MCP server was just providing a way to get a current timestamp and to query that audit table. And so then the agent could kind of go, okay, could get the current timestamp, click on a button, and then ask the audit table what changed since that timestamp.

[00:20:23] Guest: So that was the MCP server. Right. So, um, and I think that's really great because the, uh, in this combination of like coding assistance, MCP servers, you can actually then build your. Workflow that you need. Right. And because that is so flexible, um, that helps us in this situation where every legacy situation is slightly different.

[00:20:45] Guest: Right.

[00:20:46] Simon: Yeah. Interest. Very interesting. And I guess the question is, with all of this, is the goal then to create the specifications or just to get the AI to enough information to be able to [00:21:00] develop, you know, new, new features, new new, make changes on top of that?

[00:21:06] Guest: So, um, how we approached it is to, in the first step, which we call this like reverse engineering step, that just creates the description, just create.

[00:21:15] Guest: Uh, very, very comprehensive description. And the reason for that is that then when you have that description, you can actually do more things with that description, right? For example, you can go to subject matter expert and experts and sanity check if it's actually correct, right? Because you might not even have all of the information in that case with the client that didn't have the backend code anymore.

[00:21:37] Guest: We didn't know what else is hidden in that backend code. Is it calling any other services? We also tried our hand at network data capture a little bit, but not between services. So, um, we actually, when we decomp compiled the code, we found some, uh, some signs of like mainframe calls, right? So we had all of this forensics, but now we needed, uh, I don't know, somebody who lived in [00:22:00] the times back in the day when all this happened.

[00:22:03] Guest: Who can like kind of, uh, uh. Double check with us if it's actually correct, or what you can also do then with that specification is you can enhance it with other things that you want to modernize or change, right? So, um. We have a team in ThoughtWorks that's working on an open source medical record system, applica, uh, application that is, uh, open source and has been around for a long time.

[00:22:27] Guest: And so there's a, a, a front end that's still angular one. So they really urgently needed to upgrade the tech stack, but also they wanted to include more compliance with a standard in the medical records domain. And they wanted to improve the how, what, how the frontend looks, right? So now they could, they would go through the frontend components, create descriptions of what they do today, and then enrich them with those standard, standard requirements.

[00:22:54] Guest: And you know, how they maybe wanted to change the UI and then use that to generate the new, [00:23:00] uh, again, under supervision of a developer. But you know, the developer would be very accelerated.

[00:23:05] Simon: Uh, no, absolutely. And I guess there's an interesting piece there that, that, that I think sometimes is, is easy to, easy to, to, you know, pull someone in and, and just say, can you look at this?

[00:23:17] Simon: And they'll understand it. They'll know what's right and wrong. Uh, sometimes we just don't have that luxury, like you say, you know, we have to go back into the day and hope there's a developer or someone who understands that. Of course, with, with legacy code. The, the, as we say, the what and the how, uh, combined together in the code.

[00:23:35] Simon: And it's really hard to be able to identify was this written like this because for, you know, for, for a reason. Was this part of the, the intended behavior or is this just the implementation? And I guess with one thing that really. To some extent allows us to understand that expected behavior is tests.

[00:23:55] Simon: And tests are ultimately a great way of being able to say, right, my [00:24:00] change hasn't regressed anything. Um, now with Legacy Code, of course, we are very, very lucky if we have the source code, let alone tests that come along with it. What, what, in your opinion, or, or in your experience, like how, how common is it to get.

[00:24:18] Simon: A well tested legacy or, or a legacy application with a good set of tests that actually allow you to perform a reliable change and get that validation. Um, and, and I guess lacking that, how do you create that, that safety net essentially, that whatever changes we make aren't gonna, uh, adversely affect parts of the application We didn't even think about.

[00:24:42] Guest: I mean, this real case that I was talking about, there were no tests. Right? And also even if for the backend, let's say there were unit tests somewhere, I mean, that doesn't help us, right? Because it's unit level, right? Like the main thing that would've helped us is if there had been like a big end to end.

[00:24:58] Guest: Test suite or something like [00:25:00] that. Right. And in fact, like what we also looked into, but you know, it hasn't progressed to the point that I can say it was really successful or not, is we also looked into, you know, does ai, can AI help us accelerate building almost like, um, uh, fidelity fitness function to use some fancy words, right?

[00:25:21] Guest: Like an end-to-end test suite that can help us test parity, that we can actually point at both applications. Right. Um, and I think on the one hand, uh, frameworks like playwrights or kind of like web browser testing frameworks have become a lot more tolerant to different selectors in the dom, right? But with ai, you can make, in theory, make that even more flexible, right?

[00:25:43] Guest: So if you have end-to-end test that actually run on ai, you can say, um, find the plus button in the column. Right. And it will try to look for that. Right. Whereas, so you don't even have to use a selector. Right. So it gives you a little bit extra, um, flexibility. And also our idea was [00:26:00] maybe we can even use that specification that we generated with AI and use that as an input to write that end-to-end test suite faster.

[00:26:07] Guest: Right. Or with the goal of ultimately having a test suite that we can point at the old application and at the new application. Right. I mean, it turns out browser tests are already notoriously flaky. It doesn't necessarily help when you introduce more than non-determinism into it. So, uh, we didn't, like, we didn't yet like, push it far enough to say, is this really viable or not?

[00:26:31] Guest: I have a little bit of doubts, but I think there's, there's some potential there. But that's one way to, uh, to do that. And then. Overall, it's like a constant risk assessment, right? Constantly when you do this, you think about, okay, what's the probability AI got this wrong. Um, how can I mitigate that? How can I find more information?

[00:26:49] Guest: Like, you know, we decompile the binary to see if we missed anything. Uh, how complex is the application? What's the business impact when something is wrong? Right? How, what else [00:27:00] can you mitigate? Like again, classic legacy. Migration rollout strategies, like can you do a staggered rollout and start with some like, uh, or maybe do like, uh, compare when the new application is running for users.

[00:27:13] Guest: Compare results of the new one with results of the old one. This is all classic legacy migration stuff, right? So it is a constant risk assessment. This particular application, wasn. Particularly complex. So that's like a, that's like the good news, right? There's a bunch of forms. Uh, if you have a more complex application, then yeah, you have to also put a lot of work into this risk assessment and mitigation and like, how do we double check and what else can we do to test?

[00:27:41] Simon: And what, and what is your, what would you say like from a success rate point of view, um, in terms of obviously simple, simple, uh, applications you can kind of really get an understanding of, of what it should do and, and. You know, I, I would expect some one, once you have all that forensic information [00:28:00] and AI bot or agent will be able to come in and, and, and fairly easily make certain changes that are, are low risk with a more complex application where there are impacts if you make certain changes in certain places.

[00:28:14] Simon: What kinda, like, what's your success, uh, expectation, um, for, you know. Obviously YOLO one prompting it, but you know, does it, does it take a lot of iteration to, to get there? Do you get to a state where, you know what, you're actually really confident that there are no regressions, this can be fully, you know, pushed out.

[00:28:35] Simon: What's the end game?

[00:28:38] Guest: So with this simple application, uh, where we only went through the kind of like development and evaluation of an approach, we didn't actually start, um, um, building it, but uh, I was really impressed like how, with the results, right? So like I said, it was a bunch of forms. We had front end codes, schema, store procedures, and the binary, right?

[00:28:57] Guest: And screenshots and, [00:29:00] uh. It was really impressive and satisfying how quickly you could get a really comprehensive description that also made sense. Right. And even, um, it was also really cool to see, uh, like two of my colleagues went deep into the decomp, compiled binary and like fed the assembly code. To AI and navigated the assembly code together and try to find the function that, you know, when you click this button, what is the function that is being called.

[00:29:26] Guest: Right? Um, and so, hold on. Did,

[00:29:28] Simon: did, did AI navigate the assembly code there?

[00:29:31] Guest: So, uh, there was a lot of manual preparation work at first that they had to do. So to kind of like split everything in, more into multiple pieces because you couldn't feed it like everything at once. Assembly is also super verbose, right?

[00:29:43] Guest: Um, and then, you know, there was some like, uh, hints of where things were because you could like see string names of stored procedures or something, you know, so you could have some hints. And then my colleagues basically paired with ai. To try and like [00:30:00] reason about the assembly code and come up with hypothesis and theories where this is, and then to actually find the code where they were, uh, pretty confident that's that's where it is.

[00:30:09] Guest: And so ultimately it was more like a confidence booster for the results that we already had from before where we had AI infer what is happening. Right. And we would've later also been able to do some change data capture, which would've given us even more, um, confidence. But like I said, they found things like, oh, there seems to be a mainframe call here.

[00:30:26] Guest: We should find out what's, what's happening there. Right. Um, yeah, and it was just super interesting to see how well a large language model could reason about what, like could turn the assembly code into pseudo code so that we could read that as humans. Right. So, yeah.

[00:30:42] Simon: It's, it's amazing and it's kind of scary at the same time when you say, yeah.

[00:30:45] Simon: Oh, look, there's a mainframe call there that, that's like, that frightens the heck outta me thinking. Yeah. Yeah. Oh gosh. Like all, all these things that you wouldn't expect or just, just like find out. Yeah. Yeah. Um, it's incredible.

[00:30:54] Guest: And, and also something in terms of the confidence, something that we saw, um.[00:31:00]

[00:31:00] Guest: We sometimes saw the large language model that we were using often, like wasn't following our instructions very well. So, um, then what you would do is like, you kind of prepare your prompts, what you wanna do, you know, so prompts that say, here's the front end code, here's the screenshot. Uh, you know, here's the schema.

[00:31:19] Guest: Try to infer what's happening or like, what are the validations, you know, that's kind of the questions the prompts would ask. And then we would run those, uh, multiple times for the same form, right? To see like, how different are these results so that we could kind of see if we get like basically the same result every time.

[00:31:37] Guest: Then we were relatively confident that it was okay and. Together with the human sanity check. Right. But, uh, so that's how you can kind of see like the fidelity of your tool, right? Is it like totally, um, off the rails or? Does it give you something that seems to make sense?

[00:31:53] Simon: Yeah. And what are examples of, like, I guess, you know, some of the, some of the more major changes that you've made in terms of the legacy [00:32:00] modernization?

[00:32:01] Simon: You mentioned from COBOL to Java, obviously kind of pretty, pretty, pretty major. Pretty major. But also, I guess it's, it's a, it's a fairly common, like when you think about modernization, it's, it, you know, a, a cobalt to Java isn't the, the most unusual, um, path I suspect. But is it like, you know, would you also do things like framework updates and things like that?

[00:32:22] Simon: Do, do you have complete changes of architectures? Anything? Anything kinda like more major in terms of your forward engineering?

[00:32:31] Guest: Yeah, I mean the framework upgrade was kind of the example I talked about before with angular one to react and then, but then there was also additional modernization that needed to happen.

[00:32:40] Guest: Right. Um, I think there's probably a lot of. Different things that you can do when you think about this as like, what are my data sources that give me a description, right? Which can be like a lot of different things, right? Um, but maybe like when you talk about framework [00:33:00] upgrades, like there's a slightly different category I think as well.

[00:33:02] Guest: Like when you just really think like upgrade in the sense of just a technical upgrade, right? Like Java language upgrade or framework version upgrade that's maybe not quite as, uh. Uh, jarring as angular one to the latest angular version, right? There's actually, uh, tools that help you do that as well, right?

[00:33:21] Guest: Deterministic software, right? So code mods, like, uh, open rewrite or uh, uh, source graph can do a lot of that stuff. So there's like, uh, lots of tools that can help you write recipes and advanced patterns to do this, right? And um, there's also really interesting potential in combining large language models with those code more tools.

[00:33:41] Guest: So kind of either letting them help you write the patterns that you need to do the, to do the upgrade. Um, yeah. Or like, like do that, right? Kind of. Or, and then where the, where the, where the deterministic software that the pattern patterns just don't, are not enough. Then maybe you try to [00:34:00] fill that in with LLMs.

[00:34:01] Guest: Right. So, I dunno if you remember a lot sometime last year, there was a big headline by Amazon that they had saved thousands of developer years by using ai. To do a Java upgrade, but under the hood they were using Open Rewrite. Yeah. Which is like one of those Code mod frameworks. Yeah. And my suspicion is that Open Rewrite was doing a lot of the heavy lifting.

[00:34:20] Guest: Right? Yeah. Fair enough. Probably together with ai, I'm not saying that whole setup, that whole tool by Amazon was a bad tool. Right. Um, but I just found it a bit misleading to say that AI saved all of those developer years. Right. The, the, the pairing up of AI and like. Good old software tools that we have, that's where the magic happens.

[00:34:39] Simon: Right? And, and I think that's where you kind of like mix the creativity alongside the determinism, which is really important. And I think this is, in fact, we had the, we had the, the, the CEO of, uh, modern who, who they do a lot o of, uh, on, on open rewrite on just the other, just the other day on the podcast.

[00:34:56] Simon: And, um. He was talking about [00:35:00] using how they essentially use the a ST, the abstract syntax tree of the, of the, of the application. And they're able to kinda like, truly understand the deterministic flows through the app. And I think particularly when we look at something like this, when you think, okay, there's a big, there's a big change that needs to be made, I wanna make sure the flows are gonna be the same, uh, or, or, or, you know, per, perhaps at a slight, in a slightly different, more abstract way, but.

[00:35:24] Simon: It, it needs to effectively be a deterministic. Yes, this is working the same, or no, it's not. And AI can't give you that. But, uh, mixing. I love, I love that you kinda like mentioned that mixing this like deterministic a ST or existing tools that we've been using for so long with, with the creativity of an LLM and being able to kind of like, you know.

[00:35:45] Simon: Make those suggested fixes and, and, and, and then we test against more deterministic methods, I think is a, yeah, is a really strong, a really strong plan. We've seen it time and time again. Snyk does that as well and in, in sneak code and many other tools have that combination of [00:36:00] determinism in, in, in their AI solutions.

[00:36:02] Guest: Yeah. And we have to think a lot more in probabilities now in, right, like on the meta level as software developers, right? Because of this. So if you think about that situation where you cannot upgrade an application anymore, which is a huge risk, right? So it's better to replace it and risk some bugs than not do anything, right?

[00:36:20] Guest: But now you, but you can employ different things to accelerate that replacement and to increase the probability that you're getting it right? Yeah. And then you constantly have to kind of think about those trade-offs and, uh. And risks and how you fill the gaps and so on.

[00:36:35] Simon: That's a really good point. So like, obviously AI can kinda, like, will, uh, be, you know, more and more generate more reasonable applications at scale.

[00:36:47] Simon: Um, w will we, do you feel like we'll get to a stage where a legacy app won't, won't ever exist anymore. It's just an app that at some point we can just kinda like, particularly for the, you know, some of the smaller ones, why would I [00:37:00] try and do all of this? Why wouldn't I just. Why wouldn't I just rebuild this from scratch?

[00:37:03] Simon: Do do you think like that's in the, in the near term gonna be. A realistic approach, um, faster, more, again, you know, you can build it on a more modern app, much, much on a more modern stack, much, much more quickly. And, and the risk of, you know, why pull something from a legacy, an old code base, why not just rebuild it from scratch and understand exactly what's happening under the covers?

[00:37:26] Guest: Yeah. Yeah. I think that that might happen, right? It's this effect that we can already see with like organizations that have like hundreds or thousands of microservices, right? What I often see happening with developers is they're working on a service, and when it gets to a certain size, often that's not even that big.

[00:37:47] Guest: They're like, oh. This is too big. We need to split this up. We need to create a new one. Ah, we don't understand it anymore. We can't reason about it anymore. And then you end up with like two entities in every service. Right? Which is also not the right way. Right? Like maybe [00:38:00] we should just get a little bit better at modularizing inside of our services.

[00:38:03] Guest: Right? But this psychological. Uh, how do you say, uh, reflex, this reflex shows you that this might actually now happen, that, you know, people's reflex will be to like, ah, we don't understand this anymore. Let's just quickly rewrite it with ai. Right. Which maybe sometimes the, you know, sitting down and actually seeing how can we make this better, might be the more effective way to do it, right?

[00:38:30] Simon: Let's see. Well, let's talk about humans and, and. Talk about how we as developers in the loop, humans in the loop, as we call it, um, how we need to change. To be better in this process, to actually continue to, you know, run this process. There's a ton of stuff that we need to be good at in order for us to actually, uh, own the process and, and, and run the process.

[00:38:56] Simon: Uh, you mentioned a ton there in terms of like getting a bunch of those metrics and actually [00:39:00] working with ai, almost like pairing with AI to get all that correct information, I guess. It's, it's still a understanding of knowing what information we need, what information is important, uh, and really I guess, understanding the application.

[00:39:14] Simon: But, um, what else do we need to be better at in order to. Um, or, or, or, or learn new skills. We need to learn to be efficient and effective in this type of, in this new loop.

[00:39:25] Guest: Mm, yeah, that's a big topic. Also, everybody's worried about the new people coming into the profession and, uh, you know, who, who don't have those scars from like, from the past and, you know, how do.

[00:39:39] Guest: Build up new skills and I also don't have the answers, uh, uh, to all of those things. It'll be very interesting. Definitely. And I think, uh, for experienced senior, uh, developers and software engineers, it's now our responsibility to figure out how we can do that for the next generation. Right? Because like.

[00:39:57] Guest: Experienced developers don't just grow on trees like we [00:40:00] all develop by running into problems and, you know, have being called up at 2:00 AM in the morning and then thinking, oh, I'm not gonna do that again. You know? Um, so yeah, like what are the skills? I mean, um,

[00:40:15] Guest: one big challenge that we were talking about this before will be this, like, when we don't.

[00:40:20] Guest: Understand what's going on in our code base anymore, will that be a problem or not? Right? Some people are saying, oh, that's not a problem because AI will also maintain it. Right? But I've also run into situations where I've used a prompt and because of my lack of knowledge of what was going on in the code base,

[00:40:38] Guest: I, it was a, let's say subpar prompt and

[00:40:42] Guest: AI actually deleted important code,

[00:40:44] Guest: because I wasn't aware of that.

[00:40:46] Guest: Impact of the, you know, that, oh, I should tell it, but don't touch that, right? Because, so, um, so that's going to be a challenge, right? I think we should still roughly understand what are [00:41:00] the co components in our code? What is the dependencies? What is what I change this, how many other places are calling that?

[00:41:06] Guest: So how risky is it to change that and so on. Um, and then, uh, yeah, we. I hope we will get better tools to help us review large change sets, right? So we're used to just reviewing text diffs, um, but maybe that's not scalable anymore and maybe there's even like new ways with AI to help us. Visualize more to help us pull together lots of like automated information that before we didn't put together because we thought, oh, I, I went through all this code, I wrote it myself.

[00:41:40] Guest: I don't need that static code analysis, that coverage information. I know what I did. But now if I go for lunch, uh, and I send off an agent and I come back and it has changed 50 files, you know. Maybe I want a dashboard that shows me everything. Like, uh, you know, like an impact analysis or, [00:42:00] uh, you know, um, yeah.

[00:42:02] Guest: So I'm wondering about that. I think we need more tools like that and not just rely on this like text diff.

[00:42:08] Simon: Um, it's really interesting 'cause I guess it's, it's the question of what we reviewing. Are we, are we trying to review code, which actually. Testing would be a better validator for, or are we trying to test decisions or are we trying to review decisions?

[00:42:21] Simon: Um, and, and we almost like want to level up, uh, uh, code our, our reviewing to that decision level and have almost like your, your code more tested through or validated through tests. I think there's probably a a, a journey that, that we need to go on. Yeah. For that, for that to be, but you have to.

[00:42:39] Guest: You have to review the tests.

[00:42:41] Guest: Yeah. You know, a hundred percent. 'cause the AI generates the test and I've seen AI do make so many mistakes. Yeah. When building tests, like tests that, you know, don't do anything too much mocking Yeah. Um, uh, maybe making the wrong assumptions and the test data and then you think I, a, tests are [00:43:00] green, but the tests are actually testing something that is wrong.

[00:43:03] Guest: Right. Um, so. That's on the next level. Yeah. Okay. It generated all these tests, but how do I actually know? No, no. That it did what I wanted. Right. And especially when it changed a lot of things. How do I find out quickly?

[00:43:16] Simon: Mm.

[00:43:17] Guest: And

[00:43:17] Simon: will, I was gonna say biases is another area in and around there. Right? Where, where you know there's gonna be, if we are, if we're allowing AI to.

[00:43:30] Simon: Could generate tests and do all these things generate code and make decisions. We need to be very aware of our, of the biases that, in the decisions that are being made within ai, is that something, you know, or rather it is something that we should absolutely be aware of. How can we though, how, how can we train ourselves to, to really look out for when biases within the ai.

[00:43:54] Simon: Having adverse effects of, of, you know, of, of what is being generated.

[00:43:58] Guest: Yeah. Yeah. [00:44:00] There's the biases in the AI and our biases as well. Right.

[00:44:03] Simon: Very true. Like,

[00:44:03] Guest: uh, in the, in the ai when it comes to coding, I don't know if I would call it biases, but I've seen it do things like. Fill in gaps in what I prompted it with.

[00:44:12] Guest: Fill those gaps with assumptions, right? Like that's a thing. Or as we all know, they're very han they really wanna like please us and fulfill, uh, you know, our task, uh, what we ask them to do. Um, so yeah, like, uh, I don't know if there's some thing about biases as well when it comes to domain logic. I haven't really thought about that yet.

[00:44:35] Guest: Right. Like, might it lead us to implement. Something biased in the application. I hadn't thought about that yet. But then on the other side is our biases, right? So for example, because AI always frames everything so confidently, right? That even though we know it's a machine and it's a stochastic parrot and all of that, we still.

[00:44:54] Guest: It influences us, right? Oh, it's so con confidently saying that's a best practice. So it [00:45:00] must be a best practice. Right? And often we also just like skim over these words and like our IC, best practice, blah, blah. And then we're like, oh yeah, sounds good, right? Or, um. You know, once we get a solution from ai, we might be anchored to it.

[00:45:15] Guest: Right. And then it makes it harder for us ourselves to imagine other solutions when things go wrong. Right. Or there's a strange sunk cost fallacy at play here as well, right? Like, AI generates a lot of code and it doesn't really work. And then sometimes we spend two hours trying to fix it instead of throwing it away because it's already there, right?

[00:45:37] Guest: Ah, but it's almost working. And so often it's, it's, it's similar to like. Fixing forward when your production is broken versus reverting it, right? Just reverting is so much faster just to fix it. Right? And that's what you have to do with AI code a lot as well. So yeah, critical thinking. Going back to your question about the skills, right?

[00:45:57] Guest: Critical thinking and [00:46:00] yeah, understanding all these probabilities and, um, constantly getting over ourselves with, um. All of these psychological effects of this manipulative technology, right? It's going to be interesting.

[00:46:15] Simon: Absolutely. And, and, and of course it's not just gonna be us that need to be better in, in, you know, as the human in the loop, but as a team, everything's gonna, you know, the delivery's gonna change, um, the, the speed at which we deliver the quality.

[00:46:31] Simon: Um, and, and I guess from a delivery manager point of view, when everything's happening so, so much faster, uh, around around us, what do we need to be more conscious of? What do we need to. What are, are there any new best practices or things that we need to be better at as a team in order to do this, uh, delivery, um, more effectively?

[00:46:53] Guest: I mean, all of the good engineering practices that we've developed over the decades, right? Still very much matter. And gene [00:47:00] ai, uh, amplifies indiscriminately, right? So if you have like a bad setup, you might just amplify that bad setup, right? And if you have a good one, everything might go well. Right? So this speed that you are talking about, that's going to be the linchpin kind of, right?

[00:47:15] Guest: Like, can you actually, in your organization with your pipelines, with your processes, can you support. An increase in throughput, right? So, um, when you know, can you test faster, can you fill the backlog faster? Um, uh. Can you actually deal with that when you're starting from a continuous integration setup that is flaky, right.

[00:47:38] Guest: Or a test setup that is flaky. Right. So, um, I think a lot of organizations are still underestimating this foundation that you need and also good repeatable automation to make safety net to make sure that things are still going well. And, um, yeah, I mean, me as a developer, when I see AI go off the rails. I often roll back, [00:48:00] like I was just saying, right?

[00:48:01] Guest: I try to like let go and roll back, but when you are a delivery manager or a product owner and suddenly your team is like constantly like churning out more stuff and all the moving pieces around here are moving even faster, you can just roll back, right? So, um, yeah, I have a feeling that, uh, we're still underestimating this, the control that we're losing, right?

[00:48:25] Guest: Like maybe that's just like. I mean, as humans, we want to be in control and maybe we also have to let go in some places and just let AI do its thing. But I have a suspicion that that's going, we are going to run into trouble there with like when we just lose control over what's going on because so many agents are running around doing stuff for us, you know?

[00:48:44] Simon: Yeah. No, absolutely. Um. Let's finish up with, uh, I guess the age old question. The hardest question to answer in terms of, well, what, what happens to me as a developer? What, what, what do I do? Do I, do I need to go be a farmer somewhere? Or what should I [00:49:00] do? Should I, how, how does my role look in the future? Mm.

[00:49:05] Guest: I mean, at least at the moment, I still see a lot of need for the classic experience and skills. Right. When I was talking about the legacy migration, and you called it forensics, right? For the forensics, you need to know where to look. You need to know, uh, you know, to, uh, where, where the corpses are in the basement.

[00:49:24] Guest: Right. I love to, and include them in your data sources.

[00:49:27] Simon: And you mentioned, you mentioned about the junior versus the senior death. Mm. I'd love to see. Like, it's almost like, I love to call it, um, who was it, Martin? Not Martin Fowler, but, uh, Martin Thompson used the phrase, uh, mechanical sympathy. And it's about, it's about, you know, under having that sympathy of how things are built underneath to actually be able to have that, have that ability to actually understand how to use your tools or use the system.

[00:49:53] Simon: And I think if you was to look at a junior versus a senior. Um, individual in that forensics style, [00:50:00] uh, exercise, you know that the senior developer's gonna go to the right path, the right pieces because they know what they're looking for. The junior will probably ask many good questions, but have so many, many emissions, uh, as, as well.

[00:50:13] Simon: Right? And it's, it's. It's interesting that when you, when we talk about that future developer role, the the senior developer of today will be well equipped for that. How does the, how does the junior developer learn some of those skills? Do they just follow the forensics of the developer, of the senior developer?

[00:50:33] Simon: It's, it's a different level of learning.

[00:50:35] Guest: Yeah. Yeah. Yeah. And that, I mean, legacy migration is a specific case where you've always needed people who maybe are still familiar with some of like, what is an application server. Right? Right. But um. Then in the, like, let's say in just that day to day of building a current application, um, you need those skills when you're doing those risk assessments I talked about before, right?

[00:50:54] Guest: Like I constantly have to ask myself, can I trust this code? How much review should I do? Do, do you have to think [00:51:00] about, do I have a safety net? What can go wrong? What is the business impact? How critical is this piece of code? Is this complex? Is this not complex? Right? Those are also very traditional developer skills, right?

[00:51:11] Guest: So you are better at like. You're less reckless, let's say, maybe, right? And but at the same time, like us as more experienced developers, we still have to build up that experience with AI to understand when can I trust it, when can I not trust it, and so on. Right? And I mean, my hope is that the people coming into the profession now will learn just the way we did.

[00:51:32] Guest: They will make mistakes and learn from them. Right. Um, and I just hope that the period in which, like maybe everybody's a bit more reckless and like the, you know, we just increase the throughput of those mistakes is going to be as short as possible. And then we all develop a lot of critical thinking about this.

[00:51:50] Guest: It's gonna be better. Right. But I, I mean, when I was an intern, even before I had gone to university, I was an intern at a startup and I deleted the [00:52:00] production database. Right. Mm-hmm. It happens. I mean, it's like ultimately their fault for letting somebody who, who was just like a beginner. Uh, there was just one environment at the time.

[00:52:10] Guest: There weren't even multiple ones. Right. But I learned from that. I learned from that that I cannot just run a random script that might override all the tables. I bet. Right? I bet you are. I bet.

[00:52:19] Simon: The first thing you said was, I bet the first thing you said wasn't, why did you let me do that? It was, oh crap.

[00:52:24] Simon: I've deleted the production database. How do I get this back? And then you, you learn. Oh,

[00:52:29] Guest: okay.

[00:52:29] Simon: Yeah, yeah, exactly. The first thing, the first thing

[00:52:32] Guest: that happened is I had the reddest face I ever had in my life. No, but that's an extreme case of course. But that's how I learned, right? Yeah. Like I've done things wrong or I've seen other people make mistakes in the past and then go like, and then that kind of trained my intuition, my reflexes, my like, oh, maybe you should look over there.

[00:52:49] Guest: Look over there. Right? So I hope that that will still happen and that still in teams will have that environment where. We can help others not make those mistakes. [00:53:00] That's my like, positive view of how we will still learn this. Right. But yeah. Let's see.

[00:53:05] Simon: In a way, in a way, kinda like a lot of the things that you were saying about what we as developers will need to still do.

[00:53:11] Simon: Um, remaining consistent is great because those are the hard things. The things in which will change is how we go about doing that, and instead of us writing the code to achieve those things or building the infrastructure or the, the architecture to do those things, a lot of those things can be done similarly with ai.

[00:53:30] Simon: I think, you know, part of me kinda like thinks, well, will we get to those, will we get to those pain points and problems quicker? Um, you know, if I was a, if I was a junior developer and I just allow AI to do whatever, I'll still hit those problems. I'll still learn about those kind of things, and I'll maybe even do it quicker.

[00:53:48] Simon: Yeah. But I won't, yeah. It might actually

[00:53:49] Guest: bring the pain forward. Yeah. Yeah.

[00:53:51] Simon: I, I, I just, I just won't be, I can blame AI versus me actually typing that command to do whatever in the production database.

[00:53:58] Guest: Yeah, yeah, yeah. But,

[00:53:59] Simon: but it's, yeah, it's, [00:54:00] it'll be interesting to see how that, how that appears, but great to, I think, great.

[00:54:05] Simon: Great advice and great forethought in kinda like seeing actually the harder things that we need to do as developers today, irrespective of the implementations are still the hardest things that we need, that we will need to do in future. And I think those, I think I agree that I think those, those, those same needs will exist and, and, and be there for us for, for many years to come.

[00:54:25] Simon: Certainly. Um, Beita, this has been, this has been very, very enlightening and it's good, I think it's really good to talk about what is a real problem for so many, particularly in, you know, the larger enterprises, larger, larger, uh, um, organizations as well that have to deal with legacy, legacy code on a day-to-day basis.

[00:54:45] Simon: And, and it's really wonderful to kind of like, hear. Particularly in the, the way you were describing how to, how to get that information and really give AI as much as information as you possibly can, using AI as a, as a [00:55:00] tool as well to peer, peer program and peer learn, um, from that. Wonderful, wonderful to hear, to hear your, your tactics and how you, how you go about that.

[00:55:08] Simon: So really appreciate having you onto the, to the episode this, uh, this week, uh, big.

[00:55:13] Guest: Yeah, it was fun. Thanks for having me, Simon.

[00:55:15] Simon: Oh, absolutely. No problem. Anytime. And, uh, thanks very much, uh, for all, uh, listening. Um, looking forward to another topic next week. So, uh, tune in then. See you soon.

Agentic Systems

AI Tools & Assistants

Workflow Automation

AI-Native Development

Chapters

Introduction

[00:01:05]

Building with agents

[00:05:28]

Legacy migration

[00:11:33]

Relevant context

[00:17:28]

Reliable validation

[00:24:45]

Success expectation

[00:28:24]

Changes in legacy modernisation

[00:32:35]

The humans in the loop

[00:39:13]

Delivery management

[00:46:57]

The future of devs

[00:49:27]

Outro

[00:55:45]

In this episode

In this episode, host Simon Maple and Birgitta Böckeler, Distinguished Engineer at ThoughtWorks, explore the transformative role of AI in the software development process. They delve into how AI can accelerate workflows, aid in legacy modernization, and shift developers from authors to orchestrators. Key takeaways include adopting techniques like retrieval-augmented generation (RAG) over specific tools, implementing guardrails, and leveraging AI to build disciplined, reliable workflows across the software lifecycle.

AI isn’t just a feature you ship—it’s fast becoming the fabric of how software itself gets built. In this episode, host Simon Maple sits down with Birgitta Böckeler, Distinguished Engineer at ThoughtWorks and long-time contributor to the ThoughtWorks Technology Radar, to unpack how developers can use generative AI and agentic tooling to accelerate day-to-day workflows—and even tackle hard legacy modernization problems where the source code or original context is missing.

From “AI in Products” to “AI for Building Software”

Birgitta draws a clear boundary that frames the entire conversation: there’s AI inside your product, and there’s AI inside your software development process. While the former affects teams building AI-powered features, the latter touches every engineer in the organization. Her domain expertise is the latter—infusing the software delivery lifecycle itself with AI capabilities.

This distinction matters for governance and investment. If you’re embedding AI in your workflow (as opposed to shipping it to end users), your priorities are developer experience, velocity, reliability, and fit with existing SDLC controls. Interestingly, the organizational plumbing for one often benefits the other: if your product org needs self-hosted models, guardrails, and evals, those same capabilities can harden internal AI tooling for your engineers.

Through the ThoughtWorks Technology Radar—published twice a year as a snapshot of what practitioners are actually using—Birgitta sees a dramatic shift: over half of recent entries were generative-AI-related. Crucially, while no single vendor tool earned an “Adopt,” one technique did: retrieval-augmented generation (RAG). It’s a telling signal—technique over tool, and a push to operationalize patterns that reduce hallucinations and improve relevance.

The Rise of Agentic Coding: Bigger Tasks, New Capabilities

Coding assistants leveled up in late 2023 and early 2024. Beyond inline completions and single-file edits, modern tools can now:

Edit multiple files cohesively
Execute terminal commands autonomously
Run and react to tests in tight loops
Use external tools and services via protocols like MCP (Model Context Protocol), from browsing to hitting a test database

That expansion transforms the size and shape of work you can safely hand off. It’s no longer “finish this function”; it’s “create an endpoint, update the schema, adjust the tests, and fix the linter,” and the agent can iteratively act on feedback from the build and test environment.

Two modes emerged. First, interactive “pair” workflows where developers steer the agent step-by-step inside the IDE. Second, background/autonomous runs kicked off to tackle larger tasks with minimal intervention. The “vibe coding” moment spotlighted the latter—the idea that you can delegate a high-level intent and let the agent chase it down through multiple steps while you supervise.

Developer as Orchestrator: New Practices for Control and Quality

As agents become truly agentic, the developer’s role shifts from author to orchestrator. Paradoxically, the AI can move so fast that humans fall behind—unless they establish new practices to control scope and ensure quality.

Practical strategies discussed include:

Intent-first planning: Start sessions with crisp acceptance criteria and explicit constraints. Define the “done” state, the test surface area, the allowable tech stack, and anything the agent must not change.
Task decomposition for agents: Even with autonomous runs, break the work into bounded milestones (e.g., schema migration, endpoint addition, integration test repair). Smaller scopes produce more reliable loops and clearer diffs.
Parallelization with review gates: For asynchronous agent runs, fire off parallel candidates and then review outcomes side-by-side. Pick the best result or merge the strongest pieces. This pattern shines for exploratory refactors or multi-file changes where different approaches might be viable.

Guardrails should be practical and observable: run tests in a sandbox; enforce lint and type checks; mandate explicit diffs; and treat terminal access as a power tool with a chaperone. If your organization already uses evals and policy checks for product-facing AI, extend them to agent outputs: define evals for correctness, latency, and safety in the developer workflow, and add routine spot checks to maintain trust.

Legacy Modernization with AI: When You Don’t Own the Code You’re Changing

Modernization is a perfect use case to design purposeful agentic workflows. It’s not a one-off; it’s a repeatable pipeline. Birgitta describes scenarios where teams face “black box” systems—sometimes even without access to source code due to past vendor arrangements—and need to migrate frameworks, update platforms, or remediate security gaps.

Treat the effort like a structured, AI-enabled program:

Standardize a workflow: codify a repeatable series of steps—inventory components, infer behavior from interfaces, map dependencies, propose migration plans, and generate changes incrementally.
Build reusable prompts and playbooks: the same prompt templates can guide agents through 50+ components, enforcing consistency in analysis, code edits, and documentation.
Equip agents with the right tools: via MCP or equivalent, give controlled access to build commands, test runners, HTTP clients, and documentation sources. Let the agent discover behavior by running safe probes or reading artifacts, then propose changes you can review.

The key difference from day-to-day coding is repeatability at scale. You aren’t just “using an AI assistant”; you’re assembling an agentic system tailored to your migration pattern. That mindset helps you design for traceability, idempotency, and auditability—critical when touching legacy systems that nobody fully understands anymore.

Techniques Over Tools: What to Adopt Now (and How)

The Radar’s “Adopt” signal for RAG is a pragmatic north star. For developer workflows, RAG can ground agents in your codebase, architecture docs, ADRs, and runbooks, dramatically reducing hallucinations. Start by curating authoritative corpora, tagging content for relevance, and enforcing retrieval in prompts so the model cites and uses the right sources.

On the tooling front, avoid betting the farm on a single vendor. Look for tools that:

Support multi-file edits with traceable diffs
Execute tests and shell commands with logs you can audit
Integrate via MCP or similar to add capabilities safely
Allow configuration of model choice, temperature, and policies
Produce artifacts you can version (plans, change sets, explanations)

Operationally, put evaluation and feedback loops in place. Define a few golden tasks per repo to measure agent reliability. Track metrics like percent changes passing CI on first try, review effort per agent PR, average rework, and defect escape rate. Use those signals to tune prompts, adjust scopes, or swap models. Finally, train developers on the new craft—how to frame intents, supervise runs, and keep outputs aligned with coding standards.

Key Takeaways

Separate concerns: distinguish AI-in-product from AI-for-building-software. Invest in guardrails, evals, and model hosting decisions that serve both where it makes sense.
Prefer techniques over tools: RAG is ready to adopt for developer workflows—ground agents in the docs and code that matter.
Use agentic power safely: enable multi-file edits, terminal access, and tests—but in sandboxed, observable loops with clear acceptance criteria and diffs.
Orchestrate, don’t abdicate: decompose tasks, parallelize agent runs when useful, and put review gates in place. Developers are the managers of these processes.
Scale modernization with systems: for legacy migrations—especially when code context is thin—build reusable prompts, standardized workflows, and agent toolchains (via MCP) that you can run across many components.
Measure and iterate: define success metrics (first-pass CI, review time, rework) and use evals to continuously improve prompts, models, and workflows.

This episode is a field guide for engineering leaders and hands-on developers who want to use AI not as a flashy add-on, but as a disciplined, reliable co-worker across the software lifecycle—from greenfield coding to the gnarliest legacy modernization.

Resources

Birgitta Böckeler

Related episodes

TEST TO APP
IN MINUTES

Founder, Base44

Can AI Really Build Enterprise-Grade Software?

26 Aug 2025

with Maor Shlomo

AI TO HELP
DEVS 10X?

AI Product Engineer, Tessl

Why AI Coding Agents Are Here To Stay

17 Jul 2025

with Patrick Debois

LLMS:
THE TRUTH
LAUER

Sr Manager, Developer Experience, dbt Labs

Can LLMs replace structured systems to scale enterprises?

24 Jun 2025

with Jason Ganz

Agentic Systems

AI Tools & Assistants

Workflow Automation

AI-Native Development

Chapters

Introduction

[00:01:05]

Building with agents

[00:05:28]

Legacy migration

[00:11:33]

Relevant context

[00:17:28]

Reliable validation

[00:24:45]

Success expectation

[00:28:24]

Changes in legacy modernisation

[00:32:35]

The humans in the loop

[00:39:13]

Delivery management

[00:46:57]

The future of devs

[00:49:27]

Outro

[00:55:45]

Resources

Birgitta Böckeler

Related episodes

TEST TO APP
IN MINUTES

Founder, Base44

Can AI Really Build Enterprise-Grade Software?

26 Aug 2025

with Maor Shlomo

AI TO HELP
DEVS 10X?

AI Product Engineer, Tessl

Why AI Coding Agents Are Here To Stay

17 Jul 2025

with Patrick Debois

LLMS:
THE TRUTH
LAUER

Sr Manager, Developer Experience, dbt Labs

Can LLMs replace structured systems to scale enterprises?

24 Jun 2025

with Jason Ganz