WHY YOUR AI

ISN'T BOOSTING

PRODUCTIVITY

Justin Reock

Deputy CTO, DX

Back to podcasts

Why Tracking AI Usage Drives Better Results

5 Aug 2025with Justin Reock

Also available on

Transcript

[00:00:12] Simon: Hello, it's Simon Maple here, and welcome to another episode of the AI Native Dev. And today we're gonna be talking all about, um, well how good is AI for us and how can we measure it, and how can we actually make changes based on, uh, what we are measuring? Joining me today is Justin Reock, uh, and Justin is a Deputy CTO at DX.

[00:00:33] Simon: Justin, welcome, how are you?

[00:00:35] Justin: Hey Simon. Thanks for having me.

[00:00:36] Simon: Oh, absolute pleasure. And whereabouts you calling in from today, Justin?

[00:00:40] Justin: I'm in Columbia, South Carolina. My wife and I moved here about seven months ago.

Simon: Oh wow. How are you finding it?

Justin: You know what, it's a really underrated town. It's a capital city, it's a university city.

[00:00:51] Justin: There's a lot here. We have the largest zoo in the US because we combined it with a botanical garden, uh, really good bars and restaurants and, and, and culture. So we're, uh, between both of our kids, we have grown kids in Atlanta and Raleigh, North Carolina. So we're kind of three hours from either one.

[00:01:07] Simon: So Amazing. I'm already learning something. The big, it's got the, so it's the biggest zoo in the US.

[00:01:12] Justin: Biggest zoo in the US which is crazy. Yeah. You wouldn't think that it's here, but yeah, it's in terms of footprint.

[00:01:18] Simon: Alright. And tell us a little bit about DX.

[00:01:20] Justin: Yeah, so DX um, was started by, uh, Abi Noda. Uh, who is a researcher, uh, and who sold a company called Pull Panda to, uh, GitHub, ultimately Microsoft, uh, a few years ago and has been, you know, really an expert, foremost researcher on developer productivity and, and developer experience.

[00:01:39] Justin: Uh, and he teamed up with Nicole Forsgren, uh, who published the DORA Metrics and, and, and, uh, sold DORA to Google Cloud, a number of years ago, as well as Margaret-Anne Storey, who was the primary author of the SPACE Framework. And you can kind of think of it as like all of the research that's been done through like the Microsoft productivity research and..

[00:02:00] Justin: University of British Columbia, uh, in, in, uh, Victoria, all kind of combined into a product that then gathers data to try to measure developer experience, and then the knock-on effects of better developer productivity.

[00:02:12] Simon: Mm. Interesting. And, and so the company has obviously been hugely based in this space for a fair while.

[00:02:19] Simon: I think we, last time we talked, it was, was it nine/ten years old?

[00:02:22] Justin: Uh, no, no, no. DX is going on four years.

Simon Oh, four years. Pardon me.

Justin: Yeah. Still a pretty new company. Yeah.

[00:02:27] Simon: So four years. So, but it, but it was pre the, kinda like the big AI boom.

[00:02:31] Justin: Oh, for sure. Yeah. Yeah. All of this, you know, the theory and our, our measurement frameworks like the Core 4 and the Developer Experience Index, all yeah,

[00:02:40] Justin: predate, um, kind of what's, what's happened over the last year or so with, with AI

[00:02:44] Simon: And so and so kind of, you know, obviously, talking about developer experience and productivity and things like that in and around the developer space, and then all of a sudden something like the, the big wave, I think we mentioned it before on, uh, offline, the big wave of AI hits.

[00:02:59] Simon: Did you get the oh shit moment like everyone else, where it's like, hang on, is this a, is this a, a wind in our back, or you know, a wind in our face? How does this affect us?

[00:03:07] Justin: No, absolutely. I mean, I think, you know, it's fair to say that we're in that heavy disruption phase now where we have to kind of rethink a lot of, you know, truths that we've been able to take for granted in the industry for a while.

[00:03:19] Justin: Um, I think that, you know, the good news for DX is that we've always been measuring developer productivity and developer experience, and that is still the point of rolling out, AI and AI assistance and agents and things like that, at least from a software engineering perspective, I think, um, you know, it's easy to lose sight of that in all the hype.

[00:03:42] Justin: Like, why are we even doing this? Why are, you know, why is this, uh, so critical for organizations? And it's still about, you know, improving, augmenting what developers can do and, and, and buy that sort of the developer experience for the, for the better knock-on effect of, of, of productivity. So everything that we've been doing in terms of measuring,

[00:04:02] Justin: the impact of developer experience on an organization measuring the various drivers that, that, that sort of comprise what we would define as developer experience, those are still completely relevant, right? I think that we just have to start, you know, rethinking some of the, the methods that, that are gonna be used.

[00:04:18] Justin: We need to start rethinking the role of a developer a little bit. Um, but in terms of, of how to measure the productivity of an organization, I don't think that that's changed fundamentally.

[00:04:28] Simon: Yeah, and I guess thinking about the, um, with AI in particular, I feel like, um, that there's this kinda like more of a long term and a short term productivity I guess.

[00:04:37] Simon: Uh, so if we reflect with another tool, maybe an IDE or something like that, the an IDE will provide me with the ability to speed up how I'm going to do a task and I will get to an end state quicker, you know, with an IDE than without an IDE, typically talking for myself. And I, you know, I think that's largely the case with

[00:04:54] Simon: most people as well with AI of course, I guess there's two different states. One is the, can I get to a state whereby I have some code that I could check in, or some code that will, that will run and, and, and perform some certain feature. But is that then of a good enough quality that longer term. I'm not actually spending more time in the maintenance and the other things.

[00:05:16] Simon: Um, so I guess with AI, there's probably a couple of measures. One being that short term, can I, how fast can I do this task? And then secondly, more of that maintenance, the longer, the longer term, uh, aspect of is this actually helping me, uh, longer term? What, what's your kinda like take on, on, on that kind of assessment?

[00:05:34] Justin: No, I mean, I think you're, you're spot on. I think that, you know, we have a, a number of metrics that we've put together in our, our AI measurement framework, which we can, of course, you know, get into in, in this conversation. But it really does boil down to those two levers of like velocity. Quality and, and, and, and maintainability, however you want to call it, right?

[00:05:50] Justin: We, we do have to be very careful that these velocity gains that we're clearly making now aren't just trading technical debt down the road, right? As the code base becomes less maintainable or as the code quality fails, right? So we do have to really pay attention to both of those things, like, are we, you know, getting what we wanted out of this investment?

[00:06:09] Justin: As an organization, in terms of being able to augment, crank up our capabilities, and are we hopefully not just going to break things a year from now? And I think that for a lot of organizations, it is a longitudinal impact that we're thinking about. Like, you do have to consider that we may not really know for like a year.

[00:06:31] Justin: You know, with, with with, depending on how, you know, quick we're cracking up changes to the code base, which is something that, you know, it has become sort of a new bottleneck is now sort of human oversight in the process. We know that with this additional velocity comes, generally a higher PR throughput, more, more PRs being asked to merge, which means more oversight.

[00:06:49] Justin: So I, I think it is gonna take time for us to really understand organization by organization, these, these, these long-term impacts. Um, but it's clearly speeding things up now, you know, it's, it's not the 100x improvements, I, I think that's a lot of, uh, uh, information that's very good for creating YouTube subscribers and up votes and stuff, but not necessarily, uh, increased productivity for organizations, but more realistically, looking at numbers like 2025, 30% increases in, in velocity, I think is, is, is a, is a good number and, and it's certainly making an impact.

[00:07:24] Simon: Yeah, I, I, I'd love to jump into the framework in just a second.

Justin: Sure.

Simon: Very, very keen on that. Uh, I, I'd love to almost like bring it back a little bit because I think you mentioned something just a minute ago, which was super, super interesting to me, which is the fact that people are almost like forgetting to, to, to measure, forgetting to even think about measuring.

[00:07:42] Simon: And, and I feel like a lot of that is because, you know, there's this big wave of, oh, every, everything must be AI or every company's needs to be, needs to have something. AI, every product needs to have an AI feature and. We've almost like lost sight of it. Lost sight of the ‘Should we?’, and, and, and jumped into the, ‘We must do this!’

[00:08:01] Simon: Um, given that. What's the, why do we need to measure what's the, what's the value in, in measuring something which we, you know, potentially are naturally just wanting to lean into anyway, whether or not it's perfect, uh, 2x or 3x, uh, or whatever, uh, percentage. What's the, what's the true value if we take it back to fundamentals of, of doing this measurement?

[00:08:24] Justin: I think, I mean, it's such a, such a good question, right? I mean, I think that there's multiple dimensions that we really need to pay attention to as we're rolling these things out. I mean, you're right. I think that the hype and the fact that I think maybe even non-technical executives and decision makers in certain organizations like do sort of understand this technology, like they're using Claude or they're using Chat GPT or something like that.

[00:08:49] Justin: It's very different than the last major hype cycle, which was of course Cloud. And you know, with Cloud, you didn't have a lot of non-technical leaders talking about. Multi-cloud migration of container-based workloads. I mean, that, that wasn't, you know, vernacular that was necessarily understood. Um, but now you do have, you know, more non-technical leaders and decision makers who, who do have a, an inkling of what this technology can, can do.

[00:09:13] Justin: Um, and so because of that, we have this rush where it's like, there's this FOMO. It's like, oh my gosh, if we're not mandating AI in the organization, or if all of our developers aren't using AI, we're gonna fall behind, we're gonna lose our competitive edge. And, you know, there may be, uh, some truth to that, but you don't know really unless you're measuring.

[00:09:31] Justin: Right? We, we know that there are some organizations who, when they actually start, um, you know, putting the sensors in place and, and, and running the data that they're doing worse in, in some, in some cases. Right? We saw that METR study, which was, you know, I think we, we, we need to take the results of that study with a grain of salt, uh, given that it was a relatively low sample size and that the tasks wasn't, weren't, you know, necessarily indicative of what people are doing through the full SDLC.

[00:09:58] Justin: But that study provided some data that said that a lot of developers are actually slowed down, right? And, and without measuring, we don't really know. Uh, so these dimensions that we need to look at, you know, I think first of all, we need to understand utilization. Right. Okay. We roll this out, especially large scaled engineering organizations who's using it, right?

[00:10:17] Justin: Uh, can we find differences in patterns between, uh, engineers that are heavy users of AI versus ones who aren't? Uh, do we know if this is, uh, more helpful for junior developers or more helpful for senior developers and right without, you know, looking at the basic utilization metrics. We don't really have an understanding of that.

[00:10:37] Justin: And then we already talked about quality. We need to understand the overall impact to the engineering organization. Quality is, is one of those impacts, right? Is, is the code main, you know, is it still maintainable? Are we moving semantically away from, from the, the norms that we usually have in an organization, which is gonna make the code harder to read or, you know, not match what we like to do with our, with our culture.

[00:11:01] Justin: Uh, and also, are we speeding up? Like are we seeing more PRs coming through? Are we, you know, um, actually is our rate of delivery going up as a result of, of these investments that we're making in these tools? And ultimately, we will eventually need to look at cost too, right? I mean, in an organization, how much are we investing and are we actually seeing the right return on investment?

[00:11:23] Justin: Um, we may get to a point a year from now where. You know, it's, it's almost like a marketing cost. It's like you want to, to spend money on AI because that spend will provably lead to better outcomes for the, for, for the organization as a whole. But I don't think that we know that universally yet. Right. I think that that's still yet to be seen.

[00:11:43] Justin: And without data, without metrics, we really don't know which direction we're heading in.

[00:11:48] Simon: Yeah. Yeah. And it's absolutely those metrics that determine, to what extent we, we bet, and to how much we invest in those, in those types of things. Um, yeah, love the answer. Um, let's talk about the AI measurement framework then.

[00:12:02] Simon: Sure. Which is a framework that, that, that you and some other folks kind of like created in and around. Um, I guess everything from, whether it's as far as agents all the way back down to kind of the, the, the original code completion that, that swept us, uh, a number of years ago. Um, first of all, I guess who, who came up with it and, and how did you come out with the, uh, with the measurement framework?

[00:12:23] Justin: So it's primarily our CTO in Europe, Laura Tacho, who I'm sure a lot of you are familiar with, she's very prolific and, and publishes a lot of great research as well as Abi. Um, our CEO, co-founder, uh, I, I put my 2 cents in as well. It was kind of a feverishly developed, I would say, over about a 96 hour period, and then we validated it,

[00:12:45] Justin: first against data that we already had in the platform, uh, from, from, uh, metrics that were already coming in, uh, around AI tool usage. Uh, and now we've had, uh, you know, a number of companies, uh, start, start using this framework like Booking.com and Intercom. Um, but we realized that, you know, we have obviously, I think the biggest right to play in terms of measuring this impact on productivity because it's been the full focus of our company.

[00:13:13] Justin: Uh, and we've, you know, validated our existing measurement frameworks like the DX Core 4 and the DXI across millions of data points and hundreds of companies at this point. So I think we were already sort of best suited to do this. It was just, to your point earlier, we had an Oh shit moment. We're like, oh my gosh, we, we need to put something out there.

[00:13:31] Justin: Companies need to be able to measure this stuff. So let's take what we've learned, uh, from our existing frameworks and, and, and the metrics that actually can be used to drive decision making in an organization and can be used to improve, you know, the, the efficacy of an engineering organization. Let's create something complementary.

[00:13:50] Justin: Uh, which can be used specifically to look at the, uh, the impact that AI is having, uh, within an organization. Um, so we, we took the same, uh, sort of approach that we took with the Core 4, which is, uh, for, for those who may not be familiar with Core 4, it's sort of this distillation of the three top, uh, productivity metric frameworks out there.

[00:14:11] Justin: So, DORA, SPACE framework and then the DevEx framework. All three of those are sort of rolled into one, uh, metric framework. And we take individual metrics and we map them to specific dimensions. Uh, so that's like effectiveness and quality. Um, and, and, and impact in, in the case of the Core 4. We use the dimensions, uh, for our AI measurement framework of utilization, uh, impact and ultimately cost.

[00:14:40] Justin: And, and those are in order, right? Uh, those kind of the order that we recommend that you start looking at things within the organization. First of all, just understand who's using the tool. You know, daily active users, weekly active users within the organization, and then start measuring, you know, which, which contributions to the code base are actually AI assisted.

[00:15:01] Justin: And there's a number of different ways that you can do that. Probably the easiest way is just to do some experience sampling while pull requests are being entered. Just have a little box that says, yes, I used AI to help me with this. Um, ultimately we wanna look at percentage of committed code. That's AI generated.

[00:15:16] Justin: Um, that's a tougher metric to grab. Uh, but there's different ways that it can be done, like looking at the file system, for instance. Um, but we want to understand, you know, is 20% of the code that's actually making it into production, uh, is, was it AI generated? And ultimately now that everything's sort of moved past just code assistance and auto complete to more agentic stuff throughout the SDLC, we want to understand how many actual tasks have been assigned to agents.

[00:15:46] Justin: Uh, versus humans, right? So this is the first step in, in, in sort of maturity around measurement. Just understanding that aspect of who's using it.

[00:15:56] Simon: And I, that's very, very interesting and I think with those three that you just mentioned as well. So one, one is kind of like, I guess more, more around the adoption or usage.

[00:16:05] Simon: You can get some absolute around that, I guess. Um, probably it's a little bit harder to kind of work out when a developer is not using versus using and how much of their, their, their development is done with AI. And like you say, how much of your development, uh, for this pull request was AI assisted or not?

[00:16:22] Simon: Um, but it's, it's kinda like more concrete. I think the second, the third one that you mentioned as well, the cost, again, it's another concrete, uh, thing. We have an absolute cost at the end of it. I think the impact one though is, is really, really tough to be able to, to kinda like decide what, what do you, um.

[00:16:46] Simon: What, what do you measure in terms of impact? Or, or, or what, what things do you measure that, you know, categorize within the impact scope?

[00:16:56] Justin: Yeah, so the, again, the good news for DX is that this is still all about developer productivity. So our overarching metrics that have been part of the Core 4 are still essential to the way that we look at impact.

[00:17:08] Justin: Now, the DXI has 14 different drivers, uh, that we look at. Uh, but that's just one part of the Core 4, which is our effectiveness metric. Um, so the Core 4 metrics that still matter. That are still part of impact that we wanna look at are things like PR throughput, measuring our velocity, as well as our perceived rate of delivery, uh, the Developer Experience Index.

[00:17:27] Justin: Although I, I, I think and predict that we will see some of those drivers, uh, realign to how much AI is being used in the organization or used per individual developer. Um. The quality metrics code maintainability change confidence, right? Do we think, do we have this feeling that we're gonna be breaking more stuff, uh, when we accept code that was created by an agent or by some, you know, auto complete and then change fail percentage?

[00:17:52] Justin: Look, that's a great metric for quality. It's been a, a reliable metric that that has served us well through DORA, and we still look at that as our primary quality metric, but we also have to look at like developer satisfaction with these tools, right? We should be looking at, you know, uh, is this something that developers feel is actually making their experience better?

[00:18:11] Justin: Because we've seen big differences there, right? I don't want pick on any particular vendor, but like we've seen overwhelming, uh, um, uh, positive satisfaction with tools like Cursor, for instance. Um, that developers just generally like it, right? And so we need to understand is this something that's actually making their experience better?

[00:18:30] Justin: Or are we in some organizational culture where we're mandating use of something that developers actually don't like, right? That's very important, uh, to, um, you know, to, to providing better throughput for the organization by creating a better developer experience. Um, we wanna look at time savings, right?

[00:18:47] Justin: So, you know, how, how many hours a week, you know, whatever are being returned to, to developers. But we also need to look at what they're working on. Right. Just because you've returned time, uh, doesn't mean that that time is necessarily gonna be used to maybe work on new features or things that actually augment the overall productivity of the organization.

[00:19:06] Justin: And we wanna understand, you know, human equivalent hours of work that's been completed by agents because that's something that's going to inform whether we're getting the right type of return, whether, uh, whether we're enabling our engineers to use the technology the right way, right? 'cause you can't just expect to just turn it on and everyone will just magically know the best prompting practices and the highest value use cases and things.

[00:19:29] Justin: That's something that has to be enabled in the organization as well. So, you know, to, to kind of sum that up, I mean, we're looking at some of the same metrics that we've always been looking at, uh, because this still is about better productivity, better experience, but we also need to look at, uh, some metrics that are specific to the way that engineers feel about the tool, like overall satisfaction as well as how that's actually impacting their time savings.

[00:19:49] Justin: And, and we know, you know, that that software engineering and the process of creating software goes well beyond just writing code, right? I mean, even the, the most recent Dora study that came out. Uh, talked about, you know, how much time engineers actually have in large scaled organizations to sit down and write code.

[00:20:07] Justin: It's like five or six hours a week, right? So we have to be thinking beyond just code generation in terms of overall impact.

[00:20:15] Simon: Yeah. And, and I really love the, the mix really between the, as you say, like the, what's the speed at which someone can go, what's the productivity gains? That's something like AI can provide a, a, a developer, but it doesn't, it shouldn't stop there.

[00:20:29] Simon: Right. It should, it should include that sentiment that you, that you strive for as well, which I, I guess, leans then into the experience, the developer experience piece, which isn't just, can I achieve this velocity increase with using AI? It's actually, is this a pleasurable experience?

[00:20:49] Simon: Is this an experience that I am enjoying coming to my desk and, and, and, and building? Because this is ultimately the new, the new department, the new, the new role that we are creating, that we are, that we are evolving to. And it does need to be one of those roles that is enjoyable for development. We want people to, to continue to, to, to, to have that career in software development and it's, it's core, uh, to that experience.

[00:21:12] Simon: Um, so I love, I love the fact that you kind of mix those two, uh, which is, which is super, super important. Um, for, for people who are listening where they think, do you know what? I agree with everything that's being said. I love, I love what you're, what you're saying, and actually this is something that I would love to do myself.

[00:21:28] Simon: Um, it does seem kinda like a, a, a pretty, uh, a pretty deep integration with, with, I guess not just developers, but with the system as well. How easy is it to, to be able to, um, come in from scratch and say, look, I wanna start measuring all these things. Um, what, how long will it take me to actually get these integrations, get everything installed such that I can start pulling this data.

[00:21:53] Simon: And then I guess secondly, how long do I need to wait before I can actually get some kind of meaningful, uh, measurement based on how well my teams are performing with and without AI?

[00:22:06] Justin: Yeah, it's a great question. I mean, and, and it depends on which of these metrics we're, we're looking at, right? I mean, I, I think, uh.

[00:22:13] Justin: We're very intentional about these dimensions kind of being done in order as almost like a maturity scale of how well you're measuring. So looking at utilization first, then impact, and then ultimately cost. And that mirrors kind of what we saw with Cloud as well. I mean, it's amazing that you still see brand new companies, brand new companies starting up,

[00:22:34] Justin: that will help you measure your cloud costs. And we're how long into, you know, the cloud at this point? And I think you'll, you'll see probably a faster hype cycle with AI, but you'll see the same thing, right? We're gonna just, who's using this? We just gotta roll this out. We gotta get this in front of all of our engineers.

[00:22:50] Justin: And now we have to start thinking about, okay, what have we done and what is the impact of that? And then cost will be something that we need to look at later. In terms of how we measure and how long it takes to start getting, uh, good data. Again, it depends, right? We're we want to think about telemetry metrics, but those are imperfect metrics, right?

[00:23:10] Justin: Those system metrics that we can pull from, you know, Copilot API or Cursors API or, or any of these, you know, that's, that's been, you know, these tools are now working on being able to expose and provide better metrics. Um, but those metrics are, are often sort of incomplete. They rely on things that don't necessarily match reality.

[00:23:30] Justin: So, for instance, you know, the, the big popular one with Copilot is like suggested lines of code versus accepted lines of code. The problem there is that the engineer actually has to click accept in the IDE for the API to know, uh, that those numbers are, are what they are, which doesn't always happen, right?

[00:23:48] Justin: Sometimes the suggestion will show up and, oh, that's a good idea, and then we'll just type, you know, keep typing the code. Or maybe we'll even copy paste without clicking the accept button. So, you know, we, we've sort of always Said that that system metrics are, are important, uh, and they are some of the easiest metrics to gather, but they lack a full context, right?

[00:24:08] Justin: So a lot of DXs philosophy is blending what we would call qualitative or self-reported metrics with the more quantitative metrics that can come out of systems, and we measure that using surveys. Uh, we measure that using self-reported data. Now, when we think about a survey and we think about an effective survey, we're thinking about surveys that have 90, 95 plus participation rates throughout the organization, which is difficult to achieve.

[00:24:36] Justin: We spent plenty….

[00:24:38] Simon: And are those, are those survey questions in line? In those actions? Or is that a separate, kinda like survey that gets sent out to, to folks within the organization?

[00:24:46] Justin: Yeah, so the, the broader surveys are separate, right? They're sent out, uh, they take, you know, five minutes or so for an engineer to complete.

[00:24:54] Justin: Um, and you know, we look at this sort of three legged stool. Like, first of all, you have to design really effective surveys, and part of that starts with understanding that you're not really asking about. Um, individual aspects, you're actually asking more about the system, right? We think about developer experience as solving a systems problem, not a people problem, right?

[00:25:16] Justin: Because I mean, look at Edwards Deming and statistical process control, or everything that we learned from Eliyahu Goldratt in the theory of constraints, right? Deming is famous for saying that 90 to 95% of the productivity output of an organization will be determined by the system. And not the worker, right?

[00:25:31] Justin: So we need to like design survey questions that ask really good questions about how the system is performing, how the platform is performing, how the culture is performing, how the surrounded processes are, are are performing. When we think about, you know, doing that type of surveying, but experience sampling, which is different.

[00:25:49] Justin: That can be done sort of more in the moment, right? This is where, okay, we're adding a new, um, you know, a new question to our PR form. Did you use AI to help perform this task? Right? Then we can get some utilization stuff that's coming in sort of real time. Um, so it's really hard to design a highly effective survey.

[00:26:10] Justin: And, and because of that, I think a lot of engineering organizations have a hard time getting the type of, uh, participation rates that can provide, you know, really actionable data. But that doesn't mean you should quit. Like, you should try harder, you should try to design surveys that, uh, engineers actually get excited to take part in because they understand that it's providing a voice.

[00:26:29] Justin: They understand that the data coming out of that survey is actually gonna be used, uh, to make decisions about what to improve in the platform and what to improve in the overall developer experience. And if you do that, if you go into it with that mindset and you're diligent. You can run highly effective surveys to the point where we actually recommend, you know, if the, if you see some conflict, right?

[00:26:50] Justin: If a system metric coming in seems to conflict with what you're seeing from the survey metrics, you should trust the survey metrics. And figure out why the system metrics, uh, aren't aligning. There's probably something wrong in the way you're scraping the data or something stale about the data.

[00:27:03] Justin: Yeah. Um, so, so you want to measure from, from multiple angles, right? You definitely want to pull system metrics where you can, but you need to understand that those system metrics are not gonna be your silver bullet metrics. They're not gonna tell you a complete story. You also want to find a really good way to get self-reported data in and then correlate those things together.

[00:27:23] Justin: Um. You know, utilization metrics can help you on day one, right? You'll be able to see, okay, you know, which users are, are, are just straight up using these tools. Like are they, you know, are they prompting? And regardless of the number of accepted versus suggested or any of that, just the raw data on who's using these tools is something that you can get actionable information on almost immediately.

[00:27:46] Justin: These more longitudinal metrics of code maintainability change confidence change fail percentage. It's gonna take a little while, uh, to start aggregating and looking at the right trends to understand, okay, are things getting better or are things getting worse? Um, but there's not, you know, one single answer there.

[00:28:03] Justin: There's stuff that you're gonna be able to, you know, get useful information on from day one. Other things that you're gonna have to look at, uh, trends over time before you really understand. And those, those are more of the impact metrics.

[00:28:14] Simon: And, and whenever, whenever we think about metrics, I think we all fairly well know, or I think it's worth pointing out that it's, it's always good to kind of like really only

[00:28:23] Simon: largely compare against yourself to some extent, uh, whereby, you know, whatever it is, we should look at ourselves and go, okay, this is where we are right now as a baseline. Is there areas we can improve? Is there areas that are that, you know, surprise us and we need to kinda like, dig in and, and try and work out?

[00:28:40] Simon: Because I think every organization is different. And so as a result. It's hard to make those comparisons. That said, uh, I know you, that there is an amount of benchmarking data that is available whereby you can almost compare what you, what your, uh, what your data you know, is showing you against, I guess, more.

[00:28:59] Simon: Uh, industry standard industry averages. Um, and as well as that, of course, there is some, um, various companies and organizations that share some of the data that they've, that they've extracted from this, uh, this framework. Why don't we, why don't we start with the benchmarks and then kinda like jump, uh, to, to maybe some more of the specifics, uh, that folks have, uh, have shared.

[00:29:22] Simon: Um, what, what are some of the benchmarks, uh, telling us in and around? I guess for me adoption's kind of interesting, but I'd love to jump into, into the, the, the impact straight away. What is, what, what is some of the data sharing about maybe developer impact with us being able to, uh, use AI within our software process for us to be able to, you know, ship and deliver, uh, and create code that we can kind of like, you know, deliver features and things like that faster.

[00:29:52] Simon: What is the data telling us?

[00:29:53] Justin: Yeah, it's a great question. So we, we've always, uh, provided sort of industry benchmarks alongside, uh, the productivity metrics that we have to try to give, you know, um, uh, goals for continuous improvement, but also to provide a lever. You know, for an organization to build a culture, uh, at the level of maturity that they, that they want to build it, right?

[00:30:13] Justin: I mean, depending on the organization, to your point, all organizations are very different. Um, maybe the, the P50, so 50th percentile in the industry is good for you. That's, that's what you want to hit. You feel comfortable there. If you want to build. Like the highest performing engineering teams in the world.

[00:30:31] Justin: You're gonna start looking at the P90 metrics. You want to figure out what the 90th percentile looks like in terms of, um, how well they're performing with these metrics. Um, I mean, we all want to be the best, but like at, at what level do we need to really put in that effort, um, to be able to create software effectively, again, depending on the culture of our organization.

[00:30:50] Justin: So we look at P50, P75, and P90. And we break that across, you know, tech companies versus non-tech companies. And then it can even get more granular, like, okay, financial services companies and, and things like that. So that we're not just providing a platform where, uh, people can measure, uh, uh, the impact of their, you know, developer experience on, on productivity.

[00:31:12] Justin: But they can actually say, okay, well, you know, maybe with this particular driver in the DXI, so maybe like documentation or code maintainability. It turns out that we're 20 points behind, you know, the industry P50 or we're 10 points behind the, the industry P75. And so that, that gives people a goal to say, okay, well.

[00:31:30] Justin: What can we do? What should we prioritize, uh, in terms of making improvements to our process, making improvements to our platform that can drive us to, um, that kind of next benchmark. With AI in particular though, things are all over the place. I mean, uh, the DORA study that came out and, um, uh, just a few months ago now, they actually, if you go to the Dora Community website, you can download their AI impact reports, very good data, and it shows, you know, things like with a 25%.

[00:31:59] Justin: increase in overall adoption of these tools. We see things like, um, you know, 7.5%, uh, improvement in overall code documentation, but that was the highest number in that study. 7.5% improvement, uh, with a 25% overall adoption. Um, all of the results were positive in that study. But there were low percentages for some of these, like two or 3% improvements industry-wide.

[00:32:26] Justin: Now things are changing rapidly, right? I mean, a year ago these tools didn't work nearly as well as they do now, and that's why I think you've seen this sort of seismic shift around a lot of skepticism to now a lot of people being very, very bullish. Um, we have public data that Intercom, um, one of our customers achieved a 41% increase in AI driven

[00:32:50] Justin: developer time savings, um, which these are the, these are the right numbers to sort of expect right now. You know, again, the, the hype around these like 10x, 100x improvements are, are just that the data's not bearing that out. Um, but a 40% increase, I mean, sure, we'll take that, you know, that's great.

[00:33:08] Simon: And of course, and of course Intercom. We've had Des Traynor, for example, the co-founder of Intercom on, on the show, and also presenting at, uh, at, at at, at the, uh, the DevCon, uh, AI Native DevCon. And Intercom is such an AI first company in and around, not just the way that they're building, Fin their, their, um, yeah, their, their, their tool, but just the way they're using AI internally, I think it, it that it's, they're so driven to, to make the most of AI, those numbers almost don't shock me given the, given the, the rate at which they're, they're, they're adapting to use AI in their, in their methodologies.

[00:33:47] Justin: I, I think it's really, you, you raise a, a very important point that. And, uh, organizations need to think beyond code generation, right? There are so many aspects of the SDLC that do not necessarily have to do with just creating code. And the organizations that are gonna see the biggest impact are the ones who are getting really creative about

[00:34:06] Justin: you know, maybe AI code reviews, right? That's a great pattern, right? Uh, get an always on, you know, agent that can give immediate feedback about the code, uh, whether it's one of these off the shelf, uh, solutions, uh, like CodeRabbit or something like that, or whether you're just building the stuff into your own platform, which other organizations are definitely doing.

[00:34:25] Justin: They're just self-hosting a model. Or they're using a, a, an existing GPT API or something like that. But then building that into, um, the actual SDLC process, so developer goes and commits some code agent notices that immediately, you know, analyzes the diff uses maybe something like MCP to understand the context of the change, and then provides immediate feedback, whether directly in the PR comments or even providing code comments or something like that, right?

[00:34:52] Justin: Um, that can have a profound impact. We know that. You know, code review and waiting on review is a huge source of context switching, uh, for engineers. It's something that can really slow them down. Uh, so to be able to get that sort of immediate feedback is great. Same with documentation, right? I think it's very interesting that the DORA study showed us that the biggest gain was actually in overall improvement of documentation.

[00:35:14] Justin: But it also shouldn't be too surprising, right? I mean, most certainly not all, but most engineers are not crazy about having to document what they've done. It's, it's a step that is maybe not the type of work that they really enjoy doing. So being able to have something where an agent provides a draft, uh, of, of, of documentation for the change, or again, even better inline comments to make the code more readable down the line, um, is something that can absolutely be a time saver and increase the over overall throughput of the organization.

[00:35:44] Justin: Um, we're even seeing, you know. Uh, traditional, non-technical, or I'd say maybe, maybe non builders in the organization now empowered to, uh, create, you know, low risk code applications. So technical product managers or folks that are close to the project itself, but maybe haven't been able to, you know, write code in the past, they, they can now do this.

[00:36:06] Justin: So we've seen some organizations that are, you know, allowing that to occur for like API layer stuff plugins, you know, things that are, that are relatively sort of low risk and that's, that's increasing the overall throughput of the organization as well. Um, so, and we've seen PRDs generated from customer feedback where, you know, that first stage of, of planning now is actually being offloaded to an agent.

[00:36:28] Justin: So I think that, that, that is where you'll see the most impact is really being thoughtful and even creative about the various aspects of the SDLC that could be improved. Um, and to just, you know, really sit down and, and critically think about that. Like, where are our actual bottlenecks? You know? And can an agent or some, uh, AI solution actually help us there?

[00:36:48] Justin: Uh, I think we, we should just never forget these early principles of, you know, Eliyahu Goldratt, if you're not improving something that's related to the constraint, then you're not actually gonna see any downstream improvements, right? I mean, if you can, if you identify your bottlenecks, uh, in your overall throughput, and then your SDLC and you're thoughtful about, you know, how an agentic or some other type, type of solution can help with that bottleneck, you'll see very positive results if you're over-indexing on something like Code Generation, which for a lot of organizations is not the primary bottleneck.

[00:37:20] Justin: Then you need to expect, you know, lower results.

[00:37:24] Simon: Yeah. Yeah. And I think that's interesting when you talk about, I mean, I think you mentioned maturity a couple of times, and I always hate talking about maturity when it comes to AI just because I guess it's been such a short runway anyway, that we, we, we are

[00:37:36] Simon: grabbing, uh, AI and I dunno if anyone's truly mature right now, but it, but it is, it is, um, really important to kind of like identify which areas other people are, are looking into that we are perhaps, you know, less mature or haven't, haven't really started looking into those kind of areas. And I think you're spot on because, um, a lot of the aspects of the SDLC,

[00:37:59] Simon: when we think about what are we doing with AI today, a lot of people will say, well, we're using Cursor or we're using something which is a very predictable, uh, tool or, or, or space to actually use AI. And, and, and it is interesting that actually a lot of the processes that can either be automated or, or using AI to assist us, not necessarily

[00:38:20] Simon: you know, creating huge numbers of pull requests or, or, or, or helping us write code line by line. But it's actually all that, all those other things like you mentioned. I'm gonna pull some data for you here, some relevant data so that you don't actually need to look through all these things. I've, I've pulled all the information into the pull request for you so you can make a decision and you can validate, uh, what I've, what I've suggested. In terms of what people are doing to automate those various areas.

[00:38:44] Simon: I guess first of all, what are the typical places that you see most ignored or, or, or areas that you feel most people can get, you know, quicker gains that maybe they're not thinking about straight away. And, and what are the things that people are typically doing to make the biggest productivity or, or, or developer experience gains in those particular areas?

[00:39:12] Simon: And maybe even thinking things like whether it's coding or documentation or reviewing that you've kinda like alluded to. What, what are the specifics that people can actually do?

[00:39:20] Justin: Yeah, so another great question. I, I think, um, the, the, the one word that I can use to, to sum all that up is enablement, right?

[00:39:27] Justin: That, that's really being missed by a lot of organizations. We, um, we set out to kind of study this back in April, uh, where we were seeing, uh, again, I mentioned before, these results that are like all over the place, right? Just unevenly, distributed, whatever you wanna call it. Some organizations clearly feeling like they're seeing positive outcomes from this.

[00:39:45] Justin: Other organizations not really sure, or, or even feeling like they might be, uh, seeing negative outcomes. And so, um, I, I put a study together where we published into this guide that we have published now on our website, the Guide to AI Assisted Engineering, where I interviewed a lot of like S level, so senior level, uh, leadership, um, as well as broad surveys of, of engineers who were reporting, uh, like good positive time savings.

[00:40:10] Justin: I, I wanted to see what was different in these cultures, you know, and it really came down to how engineers were being enabled to use, to use the, uh, the tools from both a prompting best practices standpoint. So understanding things like meta prompting and multi-shot prompting, understanding how to take advantage of, of being able to manipulate a system prompt or Cursor calls them Cursor rules, understanding how temperature is going to affect, uh, the outcome of the way that you're doing your prompting, um, and understanding how you're gonna

[00:40:39] Justin: feed the right context, uh, to these models. And then with a broad survey of engineers, it was okay, please stack rank what you think, you know, your top five, uh, use cases, most time-saving use cases are, uh, for using some of these tools. And it was really interesting that, that the top one that came out of that, uh, broad study was, um, a stack trace analysis, which is not even a generative use case.

[00:41:03] Justin: Um, that was eye-opening for me as, as somebody who's an OG developer of writing code, uh, professionally since the, the late nineties, um, to, to understand like, and, and Java code for, for that matter. You just get used to, okay, stack trace, build fails. I'm gonna go line by line. I'm gonna figure out what went wrong with this.

[00:41:21] Justin: But if you allow the, uh, the, the assistant or whatever to do that explanation for you, you really do save a lot of time. Right. So, um, enablement is, is kind of the first thing that I would, would say there. Um, I think that, uh, that, that then brainstorming and planning, uh, so you've probably seen some version of this workflow now where it's, uh, I'm a project manager and you're a senior React architect, and you ask me

[00:41:50] Justin: one question at a time about this problem that I'm trying to solve, and the, the assistant or the agent will do a really good comprehensive job of making sure that you're pre-planning before you've even started to write code is, is pretty bulletproof that you've considered all those aspects of planning.

[00:42:06] Justin: I mean, I don't know about, I. I always forget something, you know, when I'm thinking in that early pre-planning, you know, I, I don't think I, I miss some aspect of what I should be doing with the data schema or high availability or fault tolerance or whatever it is. And, and, and these things will remind you.

[00:42:21] Justin: Uh, and then you can literally turn around and like copy that whole chat, chat transcript into like a reasoning model. And say, okay, now take this conversation that we just had and break this into individual units of work. Even with prompting examples that I could use later. You don't have to stop there.

[00:42:36] Justin: You can take that specification and feed it to a code generating model to scaffold your entire project for you. So there's a lot of time savings that can be gained in that early brainstorming and planning process. It's actually one of my favorite workflows that I've tried to be a little bit more reflexive about in terms of my, my own use, uh, and then certainly documentation.

[00:42:55] Justin: Um, pre-code review, right? These are really good use cases, uh, that can, uh, help unstick other bottlenecks in your process. But I'd go back to what I said, said before, like, understand your individual engineering culture first. Understand your value stream, right? Do that. I think, uh, you made a really good point about maturity, you know, and that it's, it's that yes, things are moving at, at speed of light right now.

[00:43:20] Justin: Um, but. It's still relatively nascent. We're still figuring out the best ways to do that, but I would actually take that one step further, and I would say we still haven't matured around developer productivity and developer experience, right? We've worked really, really hard to get a lot of thought leadership out there and a lot of research out there to help organizations understand what developer experience really means.

[00:43:44] Justin: And how that impacts the overall productivity of an organization. But we have not matured around it. There's not a common language for it. You ask 10 different leaders and you probably still will get 10 different answers about this. And then we just fast forward it to AI without really solving that initial problem.

[00:44:01] Justin: Right. And so I think like this is why, you know, again, the measurement is still so important. This is why we, we, we don't need, we, we can't lose sight of those conversations around what developer experience means, what a good developer experience looks like, and how that impacts our overall value stream, right?

[00:44:19] Justin: So, um, so I think, you know, there are these aspects that we're learning, we're getting creative about, and there's definitely, you know, common use cases, best practices, best prompting techniques, uh, high value use cases that everyone should align around as part of that maturity, but let's not forget that we still haven't solved for the greater problem of measuring and understanding and improving developer productivity with or without AI.

[00:44:42] Simon: Yeah. Yeah. And I think there's levels of AI as well of how we're gonna be using AI in our software development whereby, you know, today the SDLC looks kind of like a still fairly traditional SDLC, um, whereby AI is working more as an assistant to us. How's that SDLC gonna change perhaps as we lean more into specs or, uh, AI,

[00:45:03] Simon: Agentic AI takes more of the, of the heavy lifting. Um, there are so many bigger changes. Uh, it's gonna affect productivity and the, the experience of our development, uh, so much. I think there are so many steps to go. We're really looking forward to see how this evolves. Um, Justin, for those folks who were, uh, were, are interested in.

[00:45:27] Simon: Testing this out, uh, learning more. Where, where would you say is the best place for someone to start?

[00:45:33] Justin: Well, of course I'm biased, but I would say start with the research that, that we've already done. Uh, we do publish a lot of public research on this. Our, our impact framework is, is if you go to our website, getdx.com, you right now, the banner header on the website is

[00:45:48] Justin: check out the AI measurement framework. So we have a, a long white paper, uh, where we detail these, these metrics that I mentioned before, the different dimensions that we place them into. Uh, we also have a survey template that's available, uh, for measuring AI impact, uh, alongside, uh, this measurement framework.

[00:46:04] Justin: Which you can pull. Um, so I would of course say, uh, start with the research that we've produced. Uh, it's all there for you. It's all free. Um, and you can, uh, start, you know, putting some of these survey questions, some of these experience sampling techniques in place today and start gathering, um, some of this data.

[00:46:20] Justin: Um. I would think, you know, again, you, you really want to make sure that you understand what you're measuring, uh, and more importantly, who's gonna use this data, right? Um, that's a question that, that often people skip past, whether we're talking about AI measurement or we're talking about just overall measurement of developer productivity.

[00:46:39] Justin: My good, uh, friend Max Kanat-Alexander, who, uh, ran developer experience at Google and then published the Developer Productivity and Happiness Framework at LinkedIn and uh, now is at Capital One, helping them with the developer experience. He taught me that in many ways, the audience is more important than the data, right?

[00:46:56] Justin: Yeah. If you can't comfortably say, who's gonna be upset if this data's not available or even paralyzed because they're using this to drive their own decision making in the organization. If you can't comfortably answer who that role is, then you may as well not even capture the data in the first place.

[00:47:11] Justin: Then you're just gonna have this dashboard that nobody uses, right? So I would say start, you know, by thinking about, okay, who's gonna be the recipient of this data? What decisions, uh, is that going to inform? Then that will help guide you to the data points that are actually going to matter for your culture and your organization.

[00:47:30] Justin: Then it's a matter of do we get this from the system metric? Do we get this from experience sampling? Do we get this from a, from a survey? Figure out what's gonna be kind of most effective. Uh, don't overindex on a single metric ever. Right? That, that hasn't changed. Um, it's so easy when we are only looking at one metric like PR throughput or something like that, or, or, uh, you know, code, percentage of code generated by AI if we start really, uh, incentivizing, um, based on those metrics or weaponizing based on those metrics for that matter.

[00:48:03] Justin: Then our old friend Goodhart's Law comes into effect immediately. We start figuring out ways to game the system, but we're not actually improving our overall productivity. The textbook one with PR throughput has always been like, oh, I have to get 10, 10 PRs a week. Great. I'll update the ReadMe file 10 times on Monday, and I'm done.

[00:48:20] Justin: Right. You know, and, but I haven't actually contributed anything of value. Right. So identify the audience. And then, uh, think about the decisions that they need to make that will help you move towards which metrics plural are important to start looking at, uh, so that we can understand the tensions, uh, between those metrics.

[00:48:39] Justin: Um. And then we'll, we'll ultimately use those metrics to drive better decision making. Uh, but so yeah, we have lots of research. Uh, we have a whole white paper around this metric framework that we've been talking about. We have survey templates ready to go. Uh, you just load 'em up in Google forms and start, you know, uh, start, start running them within your organization.

[00:48:56] Justin: And that's, that's a great start. Um, but, you know, continue to be vigilant, you know, continue to be thoughtful about which metrics are going to matter the most for your culture. Uh, and, and don't, uh, cut corners. Um, you know, make sure that you're pulling not just system metrics, not just telemetry, but also the qualitative data that provides context for that, uh, telemetry.

[00:49:18] Simon: Yeah. Such top advice. 'cause I feel like that, you know, completely changes the way you go about this entire measurements journey. So, so important to, to, to learn that from the start and then build everything, uh, from those, from those decisions. So, um. So, yeah, so getdx.com was it, did you say…

[00:49:36] Justin: getdx.com

[00:49:37] Simon: Awesome Justin. Absolutely a pleasure. We could probably talk for another hour or so, quite

Justin: Oh, certainly.

Simon: Really, really appreciate your insights. Um, and, uh, and really looking forward to hearing what our listeners think of, uh, of, of, of the framework. And it'll be great to hear, you know, how, how people get on.

[00:49:55] Simon: Uh, if people wanna reach out to you, what's the best way of, uh, reaching Justin?

[00:49:58] Justin: LinkedIn probably. That's where I do most of my stuff now. I was on that old, uh, X for a while and things changed.

[00:50:07] Simon: Oh yeah, I've heard of it. I've heard of it, yeah. Yeah. Awesome. So, so yeah, catch Justin on, uh, on LinkedIn.

[00:50:12] Simon: Then, uh, really, really pleasure, uh, chatting with you on, uh, on, on today's episode. Um, thanks very much for the insights, Justin.

[00:50:19] Justin: Thanks so much Simon. Had a great time.

[00:50:21] Simon: Wonderful. And thanks everyone for, uh, for tuning in. Um, hope you enjoyed that session and looking forward to, uh, to, to, to another episode.

[00:50:29] Simon: Speak soon.

Developer Experience

AI Tools & Assistants

Industry Insights

Chapters

Trailer

[00:00:00]

Introduction

[00:01:11]

Pre-AI and the big wave

[00:02:15]

IDEs: short-term speed vs. long-term quality

[00:05:25]

Why measure?

[00:08:24]

AI Framework

[00:12:52]

Developer impact and experience

[00:17:31]

Setup time and impact metrics

[00:21:59]

Industry benchmarks

[00:29:52]

Beyond codegen

[00:34:24]

Dev productivity wins

[00:39:19]

Best place to start

[00:45:53]

Outro

[00:50:11]

In this episode

With AI adoption surging across organizations, Justin Reock, CTO at DX, joins Simon Maple to break down the difference between meaningful integration and simply chasing trends. They also explore the two key levers of velocity—quality and maintainability—discuss why measuring real AI impact begins with understanding who’s using it and how, and examine how AI supercharges developer productivity throughout the software development lifecycle.

Context and Purpose

AI tooling is now pervasive in software engineering, yet many organisations adopt it on faith rather than proof. In this episode of AI Native Dev, host Simon Maple speaks with Justin Reock, Deputy CTO at DX, about turning that enthusiasm into evidence-based practice. DX—founded by productivity researchers Abi Noda, Nicole Forsgren and Margaret-Anne Storey—builds on the DORA, SPACE and DevEx bodies of work to quantify developer experience and productivity. Their new AI Measurement Framework extends these foundations to reveal whether AI is truly accelerating delivery or merely adding cost and risk.

Why Measurement Matters

Non-technical executives can experiment with ChatGPT or Claude, fuelling pressure to “do something AI.” Without data, that pressure can drive teams toward expensive tools that slow them down or erode quality. Borrowing Deming’s maxim that systems, not individuals, dictate most output, Reock argues that reliable metrics are the only safeguard against misguided investments. Early studies—including the 2024 DORA AI Impact Report—show modest but real gains (for example a 7.5 % improvement in documentation quality at 25% adoption), while success stories such as Intercom’s 41% time savings reveal what is possible when AI is used deliberately.

The DX AI Measurement Framework

The framework tracks three dimensions in ascending order of maturity.

Utilisation measures daily and weekly active users, the proportion of committed code that is AI-generated, and the number of tasks assigned to agents rather than humans. Experience-sampling questions—such as “Did AI assist this pull request?”—can fill telemetry gaps in IDEs or APIs.

Impact builds on the established Core 4 metrics: pull-request throughput, change-fail rate, maintainability, perceived delivery speed, and the 14-driver Developer Experience Index. These are combined with AI-specific indicators like perceived time savings, stack-trace resolution, and developer satisfaction with AI tools. DX emphasises the importance of survey data here; when survey results and system metrics diverge, the former should be trusted and the latter investigated.

Cost becomes relevant once usage stabilises, tracking AI-related expenses—such as licence fees and inference costs—against the human-equivalent hours returned to the organisation.

Implementation and Maturity Path

DX recommends starting small: instrument utilisation on day one, launch concise surveys with >90 % participation, and correlate findings against existing delivery metrics. Over time, broaden coverage to quality and maintainability trends, then fold in cost analysis. Crucially, every metric must have an audience—someone who will be blocked if the data disappears—otherwise dashboards become shelf-ware and Goodhart’s Law (gaming single metrics) takes hold.

High-Leverage Use-Cases

The interview highlights gains beyond raw code generation. Always-on AI code-review agents slash wait time and context-switching. Automated documentation and inline comments boost future maintainability—the single biggest improvement surfaced in DORA’s study. Stack-trace explanation, the top time-saver in DX’s April 2025 survey, turns a tedious debugging chore into a near-instant result. Early-stage planning also benefits: prompting models to challenge requirements, produce draft specifications, split work into tickets, and scaffold repositories compresses the idea-to-code cycle while reducing omissions.

Enablement over Mandate

Organisations seeing the best returns invest in developer enablement: training on prompt-engineering techniques (meta-prompting, multi-shot prompting, temperature control), surfacing high-value workflows, and embedding AI into the internal platform rather than prescribing a single vendor tool. Culture matters; developers who enjoy their assistants use them more and maintain velocity without accruing technical debt.

Conclusion

AI can raise software-delivery performance by 20–40 % today, but only if its deployment is guided by rigorous, multi-dimensional measurement and a focus on system bottlenecks. DX’s AI Measurement Framework offers a pragmatic path: instrument utilisation, verify impact through blended telemetry and surveys, and weigh benefits against cost. Organisations that treat AI as a feature checkbox will chase hype; those that measure, enable and iterate will compound genuine productivity gains.

Resources

Related Podcasts

DEVOPS TO AI:
TIP FOR INTEGRATION
& CULTURE CHANGE

AI Product Engineer, Tessl

From DevOps to AI: Strategies for Successful AI Integration + Cultural Change

22 Oct 2024

with Patrick Debois

AI SECURITY, TRUST
& FUTURE OF
OBSERVABILITY

Co-founder, CEO, Datadog

Datadog CEO Olivier Pomel on AI, Trust, and Observability

1 Apr 2025

with Olivier Pomel

DOES AI GENERATE

SECURE CODE?

Founder, Spy Dynamics

Does AI Generate Secure Code? Tackling AppSec in the Face of AI Dev Acceleration...

24 Sept 2024

with Caleb Sima

Developer Experience

AI Tools & Assistants

Industry Insights

Chapters

Trailer

[00:00:00]

Introduction

[00:01:11]

Pre-AI and the big wave

[00:02:15]

IDEs: short-term speed vs. long-term quality

[00:05:25]

Why measure?

[00:08:24]

AI Framework

[00:12:52]

Developer impact and experience

[00:17:31]

Setup time and impact metrics

[00:21:59]

Industry benchmarks

[00:29:52]

Beyond codegen

[00:34:24]

Dev productivity wins

[00:39:19]

Best place to start

[00:45:53]

Outro

[00:50:11]

Resources

Related Podcasts

DEVOPS TO AI:
TIP FOR INTEGRATION
& CULTURE CHANGE

AI Product Engineer, Tessl

From DevOps to AI: Strategies for Successful AI Integration + Cultural Change

22 Oct 2024

with Patrick Debois

AI SECURITY, TRUST
& FUTURE OF
OBSERVABILITY

Co-founder, CEO, Datadog

Datadog CEO Olivier Pomel on AI, Trust, and Observability

1 Apr 2025

with Olivier Pomel

DOES AI GENERATE

SECURE CODE?

Founder, Spy Dynamics

Does AI Generate Secure Code? Tackling AppSec in the Face of AI Dev Acceleration...

24 Sept 2024

with Caleb Sima