In 2024, we saw the rise of AI Coding Assistants turn code creation into something effortless and almost magical—“vibe coding.” But what about the other 70% of enterprise engineering work: the troubleshooting, alert handling, capacity tuning, and endless Day-2 operations? A new class of “AI SRE” tools is emerging — fusing observability with automation and beginning to reshape the $38B ecosystem of operations tools for observability, alerting, incident response and runbook automation.
Transforming Productivity with AI in Enterprise Engineering
In a compelling talk titled "AI For ‘The Other 70%’ Of Enterprise Engineering (SRE/PE/DevOps)", Kyle Forster, the founder of RunWhen, shared his extensive experience with implementing generative AI within enterprise engineering settings. His insights are drawn from practical applications in Site Reliability Engineering (SRE), Platform Engineering (PE), and DevOps, offering a nuanced perspective on the benefits and challenges of AI adoption in these fields.
Boosting Productivity Through AI Tools
Forster began by discussing the tangible benefits his team experienced after integrating AI coding tools into their workflows. They meticulously tracked productivity metrics, revealing a 25% increase in lines of code per person and a 30% boost in story points per sprint. These improvements were attributed to reduced time spent on routine coding tasks, allowing developers to focus on more strategic problems. As Forster explained, “less mental energy spent” on mundane tasks means more attention can be directed towards “identifying the right problems to solve and getting the design right”—a discussion he feels is often overlooked in the industry.
Navigating New Complexities
Despite the productivity gains, Forster cautioned about the complexities introduced by AI. The increased reliance on open-source code and dependencies resulted in a 55% monthly growth in production code, increasing the risk of encountering "landmines"—unexpected issues that can negate the efficiency gains from AI. A single such incident, Forster noted, could erase the advantages provided by AI tools, highlighting the critical need for effective mitigation strategies.
Strategic Architectural Solutions
In response to these challenges, Forster’s team explored two main architectural strategies. The first involved automating extensive “unit-test”-like checks across infrastructure and dependencies. While this approach was initially tedious, its benefits included “excellent human-readable text” when checks failed, providing actionable insights. The second strategy focused on observability tools, though Forster found these less effective for AI troubleshooting due to their lack of context. Consequently, RunWhen prioritized building a comprehensive library of automated checks and tasks, tailored to each customer environment.
Contextual Search and Troubleshooting
A significant challenge Forster highlighted was aligning the language in problem reports with the language of automated checks—a task complicated by the variety of environments and microservices. Drawing inspiration from Google Maps’ search model, RunWhen developed an AI agent utilizing “vector search” to enhance semantic analysis with contextual metadata. This approach ensures the AI selects the appropriate troubleshooting tools, a necessity for successful AI implementation in enterprise engineering.
Addressing Scaling Challenges
Forster concluded by discussing the scaling difficulties associated with managing over 10,000 tools, which strain existing Multi-Cluster Processing (MCP) architectures. He emphasized the need for innovation in this area as AI continues to integrate more deeply into enterprise workflows.
About This Speaker
Kyle Forster
Founder, RunWhen
Kyle Forster is the Founder and CEO of RunWhen, an AI SRE startup bridging observability with automation. Prior to founding RunWhen, Kyle was the Senior Director for Product Management on Google's Kubernetes team. Earlier, he was the Founder of Big Switch Networks, a pioneer of Software Defined Networking which he helped grow to $70m in sales before being acquired by Arista in 2020. Kyle had earlier held product management and strategy roles at Cisco and Microsoft. He has 9 patents in Software Defined Networking and Wireless Networking, an MBA and MS in Computer Science from Stanford University and a BSE in Electrical Engineering from Princeton University.
THE WEEKLY DIGEST
Subscribe
Sign up to be notified when we post.
Subscribe