
Why Tracking AI Usage Drives Better Results
Also available on
Chapters
In this episode
With AI adoption surging across organizations, Justin Reock, CTO at DX, joins Simon Maple to break down the difference between meaningful integration and simply chasing trends. They also explore the two key levers of velocity—quality and maintainability—discuss why measuring real AI impact begins with understanding who’s using it and how, and examine how AI supercharges developer productivity throughout the software development lifecycle.
Context and Purpose
AI tooling is now pervasive in software engineering, yet many organisations adopt it on faith rather than proof. In this episode of AI Native Dev, host Simon Maple speaks with Justin Reock, Deputy CTO at DX, about turning that enthusiasm into evidence-based practice. DX—founded by productivity researchers Abi Noda, Nicole Forsgren and Margaret-Anne Storey—builds on the DORA, SPACE and DevEx bodies of work to quantify developer experience and productivity. Their new AI Measurement Framework extends these foundations to reveal whether AI is truly accelerating delivery or merely adding cost and risk.
Why Measurement Matters
Non-technical executives can experiment with ChatGPT or Claude, fuelling pressure to “do something AI.” Without data, that pressure can drive teams toward expensive tools that slow them down or erode quality. Borrowing Deming’s maxim that systems, not individuals, dictate most output, Reock argues that reliable metrics are the only safeguard against misguided investments. Early studies—including the 2024 DORA AI Impact Report—show modest but real gains (for example a 7.5 % improvement in documentation quality at 25% adoption), while success stories such as Intercom’s 41% time savings reveal what is possible when AI is used deliberately.
The DX AI Measurement Framework
The framework tracks three dimensions in ascending order of maturity.
Utilisation measures daily and weekly active users, the proportion of committed code that is AI-generated, and the number of tasks assigned to agents rather than humans. Experience-sampling questions—such as “Did AI assist this pull request?”—can fill telemetry gaps in IDEs or APIs.
Impact builds on the established Core 4 metrics: pull-request throughput, change-fail rate, maintainability, perceived delivery speed, and the 14-driver Developer Experience Index. These are combined with AI-specific indicators like perceived time savings, stack-trace resolution, and developer satisfaction with AI tools. DX emphasises the importance of survey data here; when survey results and system metrics diverge, the former should be trusted and the latter investigated.
Cost becomes relevant once usage stabilises, tracking AI-related expenses—such as licence fees and inference costs—against the human-equivalent hours returned to the organisation.
Implementation and Maturity Path
DX recommends starting small: instrument utilisation on day one, launch concise surveys with >90 % participation, and correlate findings against existing delivery metrics. Over time, broaden coverage to quality and maintainability trends, then fold in cost analysis. Crucially, every metric must have an audience—someone who will be blocked if the data disappears—otherwise dashboards become shelf-ware and Goodhart’s Law (gaming single metrics) takes hold.
High-Leverage Use-Cases
The interview highlights gains beyond raw code generation. Always-on AI code-review agents slash wait time and context-switching. Automated documentation and inline comments boost future maintainability—the single biggest improvement surfaced in DORA’s study. Stack-trace explanation, the top time-saver in DX’s April 2025 survey, turns a tedious debugging chore into a near-instant result. Early-stage planning also benefits: prompting models to challenge requirements, produce draft specifications, split work into tickets, and scaffold repositories compresses the idea-to-code cycle while reducing omissions.
Enablement over Mandate
Organisations seeing the best returns invest in developer enablement: training on prompt-engineering techniques (meta-prompting, multi-shot prompting, temperature control), surfacing high-value workflows, and embedding AI into the internal platform rather than prescribing a single vendor tool. Culture matters; developers who enjoy their assistants use them more and maintain velocity without accruing technical debt.
Conclusion
AI can raise software-delivery performance by 20–40 % today, but only if its deployment is guided by rigorous, multi-dimensional measurement and a focus on system bottlenecks. DX’s AI Measurement Framework offers a pragmatic path: instrument utilisation, verify impact through blended telemetry and surveys, and weigh benefits against cost. Organisations that treat AI as a feature checkbox will chase hype; those that measure, enable and iterate will compound genuine productivity gains.
Resources
Related Podcasts

From DevOps to AI: Strategies for Successful AI Integration + Cultural Change
22 Oct 2024
with Patrick Debois

Datadog CEO Olivier Pomel on AI, Trust, and Observability
1 Apr 2025
with Olivier Pomel

Does AI Generate Secure Code? Tackling AppSec in the Face of AI Dev Acceleration...
24 Sept 2024
with Caleb Sima