In this session, I'll talk about what the future of DevOps would look like, and self-driving infrastructure. I will also share learnings about how we build and evaluate DevOps Agents today. The current limitations of existing technologies, LLM Agents, evaluation techniques, and how we could improve them to make DevOps Agents more reliable, and more suitable for running production infrastructure at scale. How AI-Native DevOps would look like, What are LLMs good at for DevOps today, How to evaluate DevOps Agents to build more reliable automation.
Identifying the Core Challenges in Modern Infrastructure Management
Fahmy opens his talk by identifying the persistent issues plaguing modern infrastructure management. Despite advancements in technology, many DevOps engineers find themselves overwhelmed by the complexities of their roles. Fahmy identifies three prominent DevOps personas: overwhelmed DevOps engineers, platform engineers burdened by support tasks, and developers focused solely on features. He describes the ongoing struggle with "DSL hell," where the proliferation of domain-specific languages creates barriers to efficiency. Fahmy notes, "Most abstractions are inevitably leaky," underscoring the necessity for developers to understand the underlying system intricacies.
The Struggles of AI Agents in DevOps
While large language models (LLMs) have excelled in many areas, Fahmy highlights their shortcomings in DevOps. He explains that LLMs "suck at infrastructure," particularly due to the ever-evolving nature of DevOps contexts. Fahmy cites a research paper indicating that LLMs perform poorly with less common programming languages and rapidly changing knowledge domains. This disparity arises from slower feedback loops and the remote nature of infrastructure management, presenting unique challenges for AI development.
Envisioning a Self-Driving Infrastructure
Fahmy paints a picture of a future where AI agents are akin to "minions"—reliable yet occasionally unpredictable assistants. He envisions these agents as capable of performing routine maintenance, adapting across various cloud providers, and optimizing costs dynamically. Fahmy imagines a world where agents seamlessly shift workloads between platforms like Kubernetes and AWS Lambda, enhancing efficiency and reducing human error.
Notable Use Cases and Capabilities of AI Agents
Highlighting the practical applications of AI in DevOps, Fahmy presents compelling examples of agent capabilities. From orchestrating complex Kubernetes upgrades with minimal downtime to generating and refining Dockerfiles, these agents demonstrate significant potential. Fahmy asserts, “even developers struggle to get all these details right,” emphasizing the value of AI in improving both convenience and precision in infrastructure management.
Ensuring Reliability and Building Trust
Central to the deployment of self-driving infrastructure is the concept of reliability. Fahmy stresses the importance of defining and measuring reliability through metrics like "Pass@K" and the number of steps required for task completion. Acknowledging the challenges in data availability, Fahmy shares how his company, StackPack, synthesizes data from various sources to train reliable agents. He advocates for transparency, urging developers to build trust by understanding the internal reasoning of AI agents. As Fahmy puts it, “the more you see agents considering angles you never thought of… the more you realize they can be more reliable than you’d expect.”
Looking Forward: The Future of AI in DevOps
Concluding his talk, Fahmy expresses optimism about the future role of AI in empowering developers. Drawing inspiration from Isaac Asimov, he envisions AI agents as "friends" to humanity, simplifying complex tasks and enabling developers to focus on innovation. Fahmy's vision is clear: while AI agents may not be perfect, their integration into DevOps processes promises to revolutionize infrastructure management, allowing developers to rest easier and innovate faster.
About The Speaker
George Fahmy
Founder & CEO, Stakpak
George Fahmy has been building software products for over 10 years. He holds a couple of academic publications in computer science and was a founding engineer at Liquidity Network, a deep tech startup, as well as Thndr (Y Combinator S20). His interests span security, generative AI, and developer experience. He is currently the founder of Stakpak, a company focused on delivering specialized intelligence for DevOps and infrastructure work.
THE WEEKLY DIGEST
Subscribe
Sign up to be notified when we post.
Subscribe