How we hacked YC Spring 2025 batch’s AI agents
We hacked 7 of the 16 publicly-accessible YC X25 AI agents. This allowed us to leak user data, execute code remotely, and take over databases. All within 30 minutes each. In this session, we'll walk through the common mistakes these companies made and how you can mitigate these security concerns before your agents put your business at risk. In addition, learn how to mitigate from emergent AI threats continuously. Key Takeaways 1/ Most common AI agent security issues 2/ How to address them right now 3/ what to watch out for
Rene Brandel is the Cofounder & CEO of Casco. Before Casco, he was the Head of Product at AWS and inventor of "Kiro" - AWS' agentic IDE. He has a long-standing passion for AI, cloud, and developer tools. In fact, he won Europe's largest hackathon in 2016 with a voice-to-code agent before generative AI became a common-place technology.
Agenda


Now
Technology that is available today and ready to use for a wide audience.


Edge
Emerging and fringe ideas that need a little more polish before reaching the masses.


Tools in Action
Demos! Showcase your tool, its unique powers, and how to use it effectively.
Featured
- 09:2030mEdgeMain Stage – 52AtalkSpec Driven Dev - From Single Player to Multiplayer to EcosystemSpecs help a human developer drive an agent's behavior. They let you define development practices, inject missing knowledge and capture long term product definitions. This works great for a single developer, but how does it work for a team? How do you collaborate on product specs and the code they generate? How do you produce and distribute corporate knowledge? Going further, what does collaboration across an ecosystem look like? In this talk, Guy will explore the present and future of Spec Driven Development, exploring how the forming practices expand and scale to have teams, organizations and the entire ecosystem build together.  Guy Podjarny Main Stage – 52Atalk
- 11:1025mNowMain Stage – 52AtalkManaging fleets of coding agents with OpenHandsWe'll use OpenHands--an MIT-licensed agent orchestration platform--to drive a fleet of coding agents on a large real-world task. OpenHands can be used as a CLI or on the web, and agents can run on your workstation or in the cloud. We'll learn how to get more work done in parallel by leveraging cloud-based, asynchronous agents instead of running directly on our own workstations. We'll use OpenHands to monitor agents and drive them forward to build a large codebase from scratch.  Robert Brennan Main Stage – 52Atalk
- 16:3025mEdgeMain Stage – 52AtalkAX⚡️DX: High voltage dev workflowsDeveloper Experience (DX) has always been about clarity, speed, and keeping engineers productive. Developers these days are increasingly delegating to AI agents who write, test, and deploy on their behalf. For some, they delegate very little; for others they delegate a lot. That’s where applying skills in Agent Experience (AX) becomes critical. If you want strong DX going forward, you can’t ignore AX. The experience you design for agents directly shapes the effectiveness of the developers they support. This talk will explore how to shift from the practices that made DX successful in the past to the new requirements of agent-driven workflows. We’ll cover: Who you’re really serving when building tools in an era of AI delegation. Why prioritizing AX ensures your DX remains resilient and future-proof. How to measure and improve your repo/system AX with practical techniques like context tools, shared systems for teams, and measuring AX efficacy. The takeaway: to keep DX strong, you have to focus on AX. It’s not a question of supporting AI or agents in general, it’s a question of supporting the developers that use them.  Sean Roberts Main Stage – 52Atalk
Day 1: Workshops
- 09:001hRegistration and Breakfast
- 10:005mEvent kick off
- 10:052hNowMain Stage – 52AworkshopShall we play a Game? LLM Security in PracticeArtificial Intelligence (AI) is no longer a futuristic concept. It's embedded in the systems we use daily. At the core of these innovations are Large Language Models (LLMs). These LLMs can unlock new capabilities but can also introduce novel security challenges due to their non-deterministic behavior and autonomous outputs, causing issues like data leakage and unintended model behavior from attacks such as prompt injection. This workshop equips participants with the skills they need to build secure LLM-based applications through interactive, challenge-based exercises that gamify core security concepts. Prepare to level up your understanding of LLM security in a practical and fun way.  Joseph Katsioloudes Main Stage – 52Aworkshop
- 10:052hThe Landing (Ground floor)workshopBuilding Memory-Augmented Agents with MemoRizzThis workshop will introduce participants to the practice of memory engineering and put the concepts into action. Using the MemoRizz library, attendees will learn how to design and build memory-augmented agents that go beyond simple prompt-and-response to become reliable, believable, and capable systems. Key aspects we’ll cover include: * Hands-on introduction to memory engineering: how to move from theory to implementation. * Practical use of memory types: applying short-term, long-term, and shared memory in an agent. * Deep research use case: building an agent that can manage complex information retrieval and synthesis across multiple interactions. * Reliability, believability, and capability in practice: showcasing what these qualities look like when agents are backed by memory. * Evaluation overview: introducing methods and frameworks for measuring agent memory performance, including how to test for precision, recall, and retention over time. By the end of the workshop, participants will have both conceptual clarity and hands-on experience in building and evaluating memory-augmented agents.  Richmond Alake The Landing (Ground floor)workshop
- 10:052hThe Landing (lower level)workshopEmerging Patterns for Coding with Generative AIIf coding with AI sometimes feels brilliant and sometimes frustrating, you're not alone. This workshop walks you through a map of the patterns that make it more powerful and predictable, from first experiments to advanced techniques. We'll explore the inherent limitations of Generative AI alongside the new possibilities it creates - examining not just what happens, but why it happens. Understanding these underlying dynamics equips you to adapt and combine patterns in powerful ways and come up with your own solutions. Real-world examples will show some of these combinations in action. You'll leave with techniques that aren't tied to any single LLM or coding agent - practical approaches you can apply immediately, as well as the mental models that keep you effective in this rapidly changing landscape.  Lada Kesseler The Landing (lower level)workshop
- 12:051hLunch
- 13:052hMain Stage – 52AworkshopBacklog.md hands-on: Markdown tasks power the 3-way review from spec to PRBacklog.md was built 99% by AI agents, with humans setting vision and reviewing work. In this workshop you learn why specs as markdown in your repo are a game-changer and you practice the simple loop I use. I demo the 3-way review process and the steps that took Backlog.md past 3k GitHub stars. First half: why Backlog.md works out of the box for agents, the CLI and web UI, and my 3-way review process that lets me ship with confidence. Second half: you install and init Backlog.md in a new or existing repo, then use your preferred agent to create and implement tasks end to end. The 3-way review (what you’ll practice) 1) Spec review: do the why and the what match with your expectations? 2) Plan review: agent reads the code, makes a plan. Does it make sense? 3) Code review: check code, run the app or tests, verify acceptance, close the loop. Requirements: 1) A concrete idea that fits in 1 hour (new project or a small feature) 2) Laptop with Git and Node.js, or pair with someone who has it 3) An agent with a subscription or ~$10 credits. Good options: Claude Code, Codex CLI, Gemini CLI, Cursor (use what you know). If unsure, check compatibility at https://agents.md/#compatibility  Alex Gavrilescu Main Stage – 52Aworkshop
- 13:052hThe Landing (Ground floor)workshopTesting and Securing AI generated code with CursorBuilding with AI is exciting, but ensuring your apps are tested, secure, and efficient is where the real value comes in. In this hands-on session, we’ll dive into how Cursor supercharges the developer workflow for vibe-coded applications, making your process faster, safer, and smarter. What you’ll learn: ☑ A quick intro to Cursor & AI-powered development ☑ How to test AI-built applications with confidence ☑ Adding security guardrails to protect your codebase ☑ Creating custom modes to accelerate your workflows ☑ Building custom commands for repeatable tasks ☑ We will also cover new cursor features, hot off the Cursor roadmap by the time of the conference You’ll walk away with practical skills you can immediately apply to your own AI-powered dev workflows.  Shrey Shah The Landing (Ground floor)workshop
- 13:052hThe Landing (lower level)workshopBuilding Next Gen Agents with MCPStep into the world of Model Context Protocol (MCP) and learn how to build an interactive AI powered agent powered by a fully functional Chess MCP server. We will work through coding up an MCP server, adding graphical elements with MCP-UI, and play a game of chess against an AI opponent. By the end of this workshop, you'll have created a Chess MCP server that: - Accepts chess moves in algebraic notation - Maintains game state across sessions - Displays the board in ASCII and graphical format - Allows AI agents (and users) to play interactive chess games - Provides an interactive chessboard UI with drag-and-drop moves Prerequisites: Node.js 20+ installed Basic TypeScript knowledge Text editor or IDE Terminal/command line access Nanobot installed brew install nanobot-ai/tap/nanobot or download from https://github.com/nanobot-ai/nanobot/releases  Bill Maxwell The Landing (lower level)workshop
- 15:0530mCoffee Break
- 15:352hMain Stage – 52AworkshopHands-On: Building Agents with Spring AI, MCP, Java, and Amazon BedrockIn this hands-on workshop you will learn how to build & deploy production-ready AI Agents. You will use Spring AI, MCP, Java, and Amazon Bedrock and learn how to deal with production concerns like observability and security. We will start with basic prompting then expand with chat memory, RAG, and integration through MCP. In the end we will have a multi-agent system where agents interact with other agents to accomplish high-level tasks.  James Ward  Josh Long Main Stage – 52Aworkshop
- 15:352hThe Landing (Ground floor)workshopSupercharge Your Coding AgentsAI agents are moving from copilots to true coding collaborators, but most developers are still flying by feel. Tools like Claude Code, Codex, Cursor, and Gemini CLI are powerful, yet few teams know how to harness them fully. Prompts alone aren’t enough; effective agentic coding requires structure, context, and shared understanding. In this 2-hour hands-on workshop, we’ll go beyond prompting tricks and explore how collaboratively and effectively use coding agents. You’ll learn how to use specs as context, and provide agent guidance to shape behavior and output, teaching them not only what to build, but how to build it, aligned with you organisation's needs. We’ll cover techniques for: Using specs and structured guidance to steer agents toward better library use, open source and private alike Integrating team and org policies, coding styles, and conventions directly into agent workflows Managing multi-agent setups and tool integrations for larger, iterative projects Through hands on examples, you’ll see how to spercharge your coding agents, turning them into reliable teammates, who understand your stack, your rules, and your goals. If you’ve ever wondered whether your AI agent could be doing more — this session shows you how to make that happen.  Macey Baker  Simon Maple The Landing (Ground floor)workshop
- 15:352hNowThe Landing (lower level)workshopAI-Powered Application Modernization using Claude-FlowWe'll go on a journey using Claude-Flow, managing a team of Claude-Code agents underneath to do a 12-factor assessment of an open source codebase to assess its ability to be cloud-native. That is, the ability of the application to use cloud features such as dynamic scaling and high-availability. We'll then plan changes needed to resolve cloud-native issues found by the analysis. Expect a discussion about cloud-native issues found in the cloud-native assessment, as well as migration plan tactics. Attendees will leave with: - Knowledge of how to use Claude-Flow/Code to assess legacy code bases - Knowledge of how to use agentic engineering practices to plan significant changes to legacy code bases If you want to participate, you are welcome. You're also welcome to watch. If you participate, the requirements are: - Claude Code Account/Anthropic API Key so that you can use Claude Code - GitHub account capable of running GitHub Codespaces  Derek Ashmore The Landing (lower level)workshop
Day 2: Talks
- 08:001hRegistration and Breakfast
- 09:0020mMain Stage – 52AEvent kick off Simon Maple Main Stage – 52A
- 09:2030mEdgeMain Stage – 52AtalkSpec Driven Dev - From Single Player to Multiplayer to EcosystemSpecs help a human developer drive an agent's behavior. They let you define development practices, inject missing knowledge and capture long term product definitions. This works great for a single developer, but how does it work for a team? How do you collaborate on product specs and the code they generate? How do you produce and distribute corporate knowledge? Going further, what does collaboration across an ecosystem look like? In this talk, Guy will explore the present and future of Spec Driven Development, exploring how the forming practices expand and scale to have teams, organizations and the entire ecosystem build together.  Guy Podjarny Main Stage – 52Atalk
- 09:5030mMain Stage – 52Atalk[TBC] Main Stage – 52Atalk
- 10:2020mCoffee Break
- 10:4025mNowMain Stage – 52AtalkAI as an Amplifier: State of AI-assisted Software DevelopmentAI adoption in software development is nearly universal, but the results are anything but. While the industry has moved beyond simple code completion, the promised productivity gains often get lost in downstream chaos. So, what separates the high-performers from the rest? Drawing on insights from the 2025 DORA State of AI-assisted Software Development report—a study of nearly 5,000 technology professionals—this session will reveal a critical truth: AI's primary role is that of an amplifier. It magnifies the strengths of high-performing organizations and the dysfunctions of struggling ones. In this talk, we will go beyond the hype and dive deep into the data to uncover: * Why a ""legacy bottleneck"" can completely negate AI performance gains, and how loosely coupled architectures are essential for success. * How Value Stream Management (VSM) acts as a force multiplier for AI, ensuring that localized productivity gains translate into measurable improvements in team and product performance. * The crucial role of high-quality internal platforms, and why 94% of organizations have adopted platform engineering as the foundation for AI success. * The surprising human factors at play, including why 30% of developers report little to no trust in AI-generated code, and what the data says about AI's real impact on burnout and developer satisfaction. You will leave this session with a clear, data-backed framework for understanding and improving your team's AI adoption strategy. Instead of focusing on just the tools, you'll learn how to identify and address the systemic issues that are holding you back, and how to create an environment where AI can truly amplify your team's success.  Nathen Harvey Main Stage – 52Atalk
- 10:4025mEdgeThe Landing (Ground floor)talkAI Hates Legacy CodeAI coding agents feel like magic... right up until they collide with production code. For teams maintaining legacy systems, these agents often hallucinate APIs, run off on tangents, and shatter trust faster than an unreviewed hotfix at 5pm. Ignoring the past won't save us, because new code becomes old! We can do better, and we will. In this session we'll go over emerging strategies for improving the accuracy of coding agents on real codebases, benchmarks such as SWE-bench that evaluate our progress, and their limitations. Expect to walk away with actionable techniques and a renewed respect for code that came before us and the challenges ahead. Key Takeaways Why LLMs have more trouble with existing code What we can do about it How we measure progress  Ray Myers The Landing (Ground floor)talk
- 10:4025mTools in ActionThe Landing (lower level)talkThese Aren't the Tools You're Looking For: MCP Security AwakensEveryone adopts MCP Servers. Everyone deploys MCPs. Everyone secures their MCP Servers. Oh, they don’t? Who would’ve thought! Well this talk isn’t about adding authentication to your MCP Server, it’s about inviting you to the deep-end observatory of threats and risks in the MCP ecosystem. MCP Servers introduce new threat vectors and security risks from insecure MCP Servers code, to malicious MCPs harboring tool poisoning attacks and all the way to indirect prompt injection that compromise MCP IDEs and Cursor and AI apps like Claude Desktop. In this highly technical session I'll demonstrate active exploitation techniques against MCP deployments: how a single malicious tool description can exfiltrate credentials, and how insecure MCP Servers are exploited by attackers to run arbitrary code. You’ll walk away with a clear understanding of the various moving parts in the MCP security threat landscape so you can better assess your risks and security strategies as well as gain key insights and security best practices for building secure MCP servers that you can apply when adopting and building MCP servers.  Liran Tal The Landing (lower level)talk
- 11:1025mNowMain Stage – 52AtalkManaging fleets of coding agents with OpenHandsWe'll use OpenHands--an MIT-licensed agent orchestration platform--to drive a fleet of coding agents on a large real-world task. OpenHands can be used as a CLI or on the web, and agents can run on your workstation or in the cloud. We'll learn how to get more work done in parallel by leveraging cloud-based, asynchronous agents instead of running directly on our own workstations. We'll use OpenHands to monitor agents and drive them forward to build a large codebase from scratch.  Robert Brennan Main Stage – 52Atalk
- 11:1025mThe Landing (Ground floor)talk[TBC] The Landing (Ground floor)talk
- 11:1025mTools in ActionThe Landing (lower level)talkWhen Your Pair Programmer is a Swarm: Architecting UIs for Multi-Agent CodingCoding with one AI assistant is easy. Coding with ten AI agents arguing, collaborating, and fixing each other’s mistakes? That’s the future. As we embrace AI-native development, multi-agent systems will shape how software is built, if we can design frontends that let humans stay in control. This session explores how to architect real-time interfaces for multi-agent coding workflows using React, TypeScript, and streaming APIs. We’ll cover: - Visualizing agent reasoning and conversation threads in real time. - Designing conflict-resolution UIs when agents disagree. - Handling scalability + performance as agent “swarms” grow. - Ensuring trust and oversight with human-in-the-loop controls. Through live coding and demos, we’ll build a “swarm control panel” that makes agent collaboration transparent, debuggable, and even fun. You’ll walk away ready to design the interfaces that will define AI-native pair programming.  Sepehr Samadi The Landing (lower level)talk
- 11:4025mNowMain Stage – 52AtalkState of Open-Source AI Coding ModelsAs a machine learning engineer at Hugging Face, I'm at the front front of what's happening in open-source AI. There are a lot of advancements in open models for coding, such as Qwen3-Coder, GLM-4.5 which is compatible with Claude Code, Hugging Face Inference Providers VSCode integration and more. This talk aims to provide an overview of the landscape of openly available models for coding, explain the different model types and training objectives, how they stack up against closed models and the tooling around it (fine-tuning and inference).  Niels Rogge Main Stage – 52Atalk
- 11:4025mNowThe Landing (Ground floor)talkBuilding with MCP: How Protocol Primitives Shape Developer ExperienceModel Context Protocol (MCP) has unleashed a wave of innovation in the AI world by bridging large language models (LLMs) with the systems we rely on every day. You may have already used an MCP server to connect your coding agent to library docs, databases, or issue trackers. But what’s happening under the hood? In this talk, we’ll dig into the guts of MCP servers and explore the key protocol primitives that make everything click. From tool calls, sampling, prompts, elicitations, and resources, to the subtle design choices that can make or break your developer experience. For example, how a well-implemented MCP server might pause before deleting your database, asking for confirmation instead of firing off a destructive action.  Bill Maxwell The Landing (Ground floor)talk
- 11:4025mTools in ActionThe Landing (lower level)talkCode Security Reinvented: Navigating the era of AIArtificial intelligence (AI) already serves as a copilot in our daily lives, acting as a digital assistant and delivering personalized experiences. Despite progress in many areas, AI has historically fallen short of improving software development practices. This changed with the introduction of AI pair programmers, which distill the collective technical know-how of the world’s developers, and their widespread adoption has been quite telling. While the process of building software has become easier and faster, the question remains: What about more secure? In this session, we’ll demonstrate six practical ways developers can use AI to tap into the world’s security knowledge, showcased through 14 demos in GitHub Copilot. The audience will gain a deep understanding of AI capabilities for security, the pros and cons of security MCP servers, how to make informed decisions for supply chains, and other key practices along with insights drawn from our own lessons as developers striving to ship secure code. Finally, we’ll share a playground repository where attendees can safely experiment with everything demoed.  Joseph Katsioloudes The Landing (lower level)talk
- 12:1025mNowMain Stage – 52AtalkEvolving specs in Kiro to deliver incremental features faster, safer.Learn about how the Kiro team evolved the spec-based dev workflow and artifact structure into a feature which they wanted to use every day. Do you treat specs as point-in-time snapshots of technical decisions, or do you treat them as an ever-syncing reflection of your source code? We've found that by distinguishing technical summaries and designs from requirements, we are better able to iterate on decisions made previously in a safe, reliable manner. The team at Kiro went through 7 fundamentally different spec implementations and interfaces internally, and feel we've only just scratched the surface on an experience worth pursuing.  Al Harris Main Stage – 52Atalk
- 12:1025mEdgeThe Landing (Ground floor)talkDevOps Agents That Can't Delete Your DatabaseYou've seen AI coding agents deleting databases and wiping codebases. Current solutions like Claude Code's whitelists/blacklists fail because LLMs are versatile, they can do the same destructive thing in 100 different ways. This demo demonstrates novel deterministic security guardrails that make agents safe to use in production environments. We'll cover secure secret handling, mTLS for MCP, and introduce Warden, a new deterministic security enforcer that creates boundaries agents cannot cross, regardless of the tools they use. Real examples from Stakpak's open-source DevOps agent. Your first look at techniques that makes coding agents production-safe, and a reference implementation in Rust.  George Fahmy The Landing (Ground floor)talk
- 12:1025mTools in ActionThe Landing (lower level)talkBuilding AI Agents with Spring & MCPAI is here, but is it working for you? The name of the game is to give these AI models access to our enterprise systems and services and let ‘er rip! But it’s not always easy. We have a friend whose stress level trying to build production-worthy Python AI services was so high that his hairline receded TWELVE INCHES! Or that might have just been natural aging... Either way: he should’ve used Spring AI! Join us, AWS developer advocate James Ward and his trusty sidekick and Spring developer advocate Josh Long, and we’ll look at how to build MCP-enabled, RAG-ready, vibe-free, agentic systems and services in no time at all.  Josh Long  James Ward The Landing (lower level)talk
- 12:351hLunch
- 13:3525mNowMain Stage – 52AtalkBacklog.md - From zero to success with AI AgentsHeard about Claude Code, Cursor, AGENTS.md, or MCP but still hitting walls? This talk takes you from zero to a simple, efficient way to organize work for AI agents. We move from vibe coding slop to a Spec-Driven AI Development loop using Backlog.md, a CLI that keeps tasks as markdown in your repo with clear context and acceptance criteria. Backlog.md was built 99% by AI agents and has 3k+ GitHub stars, proof that the loop works. I’ll show the flow I used to build Backlog.md, the lessons learned, the DOs and DON’Ts, and how I went from a frustrating 50% task success rate to almost 100% in about a month. We connect instruction load to agent reliability and use that to split work into tasks that land. We’ll look at how Agentic Coding changes Agile in practice: which core values stay, what must change, and the new bottlenecks. You leave with a clear way to plan and parallelize work across agents, with tasks sized to what agents can follow, and why tools like Backlog.md mark the start of a new Agile process. We finish with an open question for the room: how do we keep humans in the loop in the new AI era? P.S. You won’t need any prior agent experience.  Alex Gavrilescu Main Stage – 52Atalk
- 13:3525mNowThe Landing (Ground floor)talkAgents on (Guard)Rails: Deterministic Safety for Database OperationsAs AI agents move from writing boilerplate code to performing complex, data-aware tasks, a critical question emerges: How do we prevent a creative but unsupervised AI agent from causing chaos in our databases, where every operation must be precise and predictable? Unchecked, agents pose a dual threat: accidental data corruption through malicious or incorrect Data Manipulation Language (DML) and catastrophic schema changes via unintended Data Definition Language (DDL). This session provides a practical playbook for putting agents on guardrails. We'll move beyond theory and demonstrate two powerful open-source tools that bring deterministic safety to agent-driven database operations. First, we will show how the MCP Toolbox for Databases can be used as a DML guardrail, preventing agents from executing unsafe queries by providing them with pre-approved, parameterized ""tools"" for data access. Second, we will explore how Atlas can manage and verify DDL changes, ensuring that any agent-proposed schema modifications adhere to company policy and migration best practices. This is a hands-on guide to building robust, production-ready agents that you can actually trust with your data.  Rotem Tamir The Landing (Ground floor)talk
- 13:3525mTools in ActionThe Landing (lower level)talkAI-Powered Application Modernization: From Legacy Code to Cloud-Native with Claude-FlowApplication modernization projects often stall under the weight of complexity: Do you refactor, re-platform, or replace with SaaS? AI-assisted development can cut through that complexity, turning weeks of manual analysis into hours. In this session, I’ll share a real-world exercise where my Claude-Flow swarm of coding agents was used extensively to modernize an existing open-source CRM application. We’ll cover three stages of the journey: (1) assessing the application’s fitness for cloud deployment, (2) using AI to analyze the viability of migrating users to existing SaaS alternatives, and (3) planning the modernization effort with a roadmap that accounts for technical debt, risk, and compliance gaps. This is not a theoretical discussion—I’ll show concrete outputs generated by the swarm, how the orchestration handled complex decomposition, and where human judgment was still required. Attendees will walk away with a practical, field-tested framework for applying AI agents to modernization efforts and accelerating the move to cloud-native architectures.  Derek Ashmore The Landing (lower level)talk
- 14:0525mEdgeMain Stage – 52AtalkThe Intersection of Fitness Function-driven Architecture and Agentic AIThe path to evolutionary architectures lies with architects and developers building and maintaining architectural fitness functions–the mechanics of evolutionary architecture. The most advanced flavor of that is fitness function-driven architecture–building the governance safety net before building the architecture. However, architects struggled in the past with the variety and specificity required to build and maintain this safety net. With the advent of Agentic AI and MCP, architects now have a way of advertising capabilities and allowing agentic AI to glue together the data and facilities architects need for governance. This session highlights how architecture will encapsulate Agentic AI to allow more sophisticated architectural governance, allowing teams to build more powerful systems with confidence.  Neal Ford Main Stage – 52Atalk
- 14:0525mEdgeThe Landing (Ground floor)talkEvaluating an LLM's ability to codeIn this talk I want to share the dimensions I like to look at LLM's from, the path I've taken so far at building testing capabilities, and how much more there is to do to continue research in this space. Benchmarks will come out and show that some new model is the best at coding, but that doesn't always align with real world usage. This was frustrating to me, and led me down a path of trying to build a way to evaluate in a way that would mimic real world usage.  Adam Larson The Landing (Ground floor)talk
- 14:0525mTools in ActionThe Landing (lower level)talkRoboCoders: Judgment Day – AI Spec-driven Tools Face OffAgentic AI Spec-driven tools promise cleaner code, faster development, and fewer late-night debugging sessions. But do they really deliver? In this live showdown, we'll complete identical coding tasks, from initial setup to testing and debugging, all live on stage using the tow leading spec-driven development AI tools - Tessl and Kiro. You, the audience, decide which tool actually boosts quality and productivity and which just creates more noise than useful code. Bring your skepticism, cast your vote, and expect surprises  Baruch Sadogursky  Viktor Gamov The Landing (lower level)talk
- 14:3525mNowMain Stage – 52AtalkAI assisted code reviews: Quality and Security at ScaleDiscover how AI is transforming code reviews at scale in this session led by Principal Product Manager Sneha Tuli. Drawing from Microsoft’s real-world journey, you’ll learn how integrating an AI-powered code review assistant into existing developer workflows has accelerated feedback, improved code quality, and fostered a culture of continuous learning. This session stands out by sharing not just technical innovations—like multi-agent orchestration, and custom project guidelines—but also practical lessons on building trust, ensuring safety, and driving adoption across thousands of repositories. Attendees will gain actionable insights into deploying AI responsibly, overcoming challenges such as false positives and context limitations, and preparing for the future where human and AI collaboration sets new standards for software quality and security.  Sneha Tuli Main Stage – 52Atalk
- 14:3525mEdgeThe Landing (Ground floor)talkFrom Prompts to AGENTS.md: What Survives Across Thousands of RunsWe stress‑test coding agents claude code and codex at scale and report the patterns that actually survive. Across thousands of runs on a representative golden set of agentic coding issues, we compare orchestration (single vs. parallel vs. lightweight hierarchy), reasoning styles (ReAct, Reflexion, Self‑Refine, Least‑to‑Most), and context practices (refresh, compaction, dedup). The core move is turning ephemeral prompt tweaks into durable, versioned central and per-componet AGENTS.md so improvements persist across repos and projects. We’ll augment this with a GitHub study of AGENTS.md in popular projects (adoption, typical sections, section sizes), then show how we applied the findings to Claude Code and Codex to stabilize outcomes under load. Attendees leave with defaults that improved speed, cost, size, and performance and a template you can adpot immediately. 20‑minute run‑of‑show - 2’ Why results drift and why rules beat one‑off prompts - 4’ Orchestration ladders and reasoning styles (what held up at scale) - 5’ AGENTS.md in practice, central vs. distributed, ordering, decision criteria - 5’ Context engineering that sticks and trace‑driven updates - 4’ “In the wild” snapshot (GitHub stats) and quickstart templates Top‑3 takeaways 1. A reproducible template to convert traces into AGENTS.md rules that survive across runs. 2. When to use parallel runs vs. light hierarchy and how to stage reflection without ballooning tokens. 3. Context defaults that reduce cost and latency without cratering quality.  Thomas Krier The Landing (Ground floor)talk
- 14:3525mTools in ActionThe Landing (lower level)talkContext-Aware Development in KiroModel Context Protocol (MCP) is transforming how AI agents access and utilize development context. In this live demo, you'll see how Kiro leverages MCP alongside steering files to create context-aware coding agents that understand your development environment. We'll demonstrate: MCP in action: How Kiro's AI agents use MCP to external tools in real-time Steering files: Creating and evolving steering files that guide AI behavior based on your project's specific patterns, conventions, and architectural decisions Context-aware generation: Watch as agents generate code that respects existing patterns This isn't just code completion—it's AI agents that understand your development context and can make intelligent decisions about code generation, refactoring, and feature implementation. You'll leave with practical knowledge of how MCP and steering files can transform your own development workflow with Kiro  Jonathan Vogel  Onur Dogruoz The Landing (lower level)talk
- 15:0525mNowMain Stage – 52AtalkHow we hacked YC Spring 2025 batch’s AI agentsWe hacked 7 of the 16 publicly-accessible YC X25 AI agents. This allowed us to leak user data, execute code remotely, and take over databases. All within 30 minutes each. In this session, we'll walk through the common mistakes these companies made and how you can mitigate these security concerns before your agents put your business at risk. In addition, learn how to mitigate from emergent AI threats continuously. Key Takeaways 1/ Most common AI agent security issues 2/ How to address them right now 3/ what to watch out for  Rene Brandel Main Stage – 52Atalk
- 15:0525mEdgeThe Landing (Ground floor)talkFrom Vibe Coding to Spec-Driven Development: Building AI-Native Software In The EnterpriseAt Nearform, we see Spec-Driven Development (SDD) as the blueprint for progressive delivery in the AI-native era. By applying a SDD methodology on some real customer projects, we’ve accelerated delivery with leaner teams & tested the limits of this methodology along the way. In this talk, I’ll share why SDD excites us far more than ""vibe coding"" ever did. I'll show: -How we've found it to deliver greater predictability, completeness, and context-aware outputs, with examples of vibe coding shortcomings that SDD has solved for -How codifying the traditional roles of a software team as agentic roles produces more reliable, repeatable results, with examples of how an explicit QA step + context window overcomes many traditional under-completeness reporting challenges -How we leverage multiple human-in-the-loop drivers of agents coordinate on a single project using git -How we make SDD fit into the enterprise development cycle through use of MCP connectivity, giving more transparency into the work of agents We'll also share some of the warts: -The types of projects this new development paradigm proved not yet ready to tackle -Challenges we've seen to adoption across teams & how to nurture a culture where AI native development isn't a threat Attendees will leave with concrete learnings testing the bounds of what's possible with spec driven development on larger enterprise projects.  Cian Clarke The Landing (Ground floor)talk
- 15:0525mTools in ActionThe Landing (lower level)talkSpec-Driven Code Quality in Action with Tessl MCP and Qodo CLISpecs are the backbone of reliable software, but AI tools can lose them once you're in the weeds of development phases. In this demo, I’ll show how Tessl’s spec-driven MCP server and Qodo CLI combine to keep specs alive throughout the development lifecycle. You’ll see how spec files drive AI-generated code, how review agents validate against organizational standards, and how feedback loops push insights back into specs. You'll see how continuous, spec-aligned code quality can scales with AI-assisted development.  Nnenna Ndukwe The Landing (lower level)talk
- 15:3025mCoffee break
- 16:0025mEdgeMain Stage – 52AtalkMemory Engineering: Going Beyond Context EngineeringLarge Language Models (LLMs) are powerful, but their limitations become clear in multi-turn interactions: they lose track of context, repeat mistakes, and forget what matters. Lately, developers have relied on context engineering—clever prompt design, retrieval pipelines, and compression—to work around these constraints. But context alone is ephemeral. To build agents that are reliable, believable, and capable, we need to move beyond context and into memory engineering. This talk introduces memory engineering as the natural progression of context engineering, exploring how to design systems where data is intentionally transformed into persistent, structured memory that agents can learn from, recall, and adapt with over time. We’ll walk through the data→memory pipeline, types of agent memory (short-term, long-term, shared), and practical strategies like reflection, consolidation, and managed forgetting. Finally, we extend the conversation with a Context Engineering++ perspective—a holistic view of how memory, context, and attention can be engineered together to enable the next generation of agentic systems. Attendees will leave with a clear framework for evolving from prompt engineering to context engineering to memory engineering, and practical guidance on how to architect agents that don’t just respond, but remember, adapt, and grow.  Richmond Alake Main Stage – 52Atalk
- 16:0025mThe Landing (Ground floor)[TBC] The Landing (Ground floor)
- 16:0025mTools in ActionThe Landing (lower level)talkUnlocking agent beast mode with TesslAI coding agents are evolving from copilots to true collaborators. In this session, we’ll show how to unlock their full potential using real examples from tools like Claude Code. You’ll see how structured context, specs, and guidance can transform agents into reliable teammates that understand your stack, libraries, and coding style. Through live demos, we’ll explore how to align agents with your workflow, organization, and intent, helping you code faster, smarter, and with more control.  Dru Knox The Landing (lower level)talk
- 16:3025mEdgeMain Stage – 52AtalkAX⚡️DX: High voltage dev workflowsDeveloper Experience (DX) has always been about clarity, speed, and keeping engineers productive. Developers these days are increasingly delegating to AI agents who write, test, and deploy on their behalf. For some, they delegate very little; for others they delegate a lot. That’s where applying skills in Agent Experience (AX) becomes critical. If you want strong DX going forward, you can’t ignore AX. The experience you design for agents directly shapes the effectiveness of the developers they support. This talk will explore how to shift from the practices that made DX successful in the past to the new requirements of agent-driven workflows. We’ll cover: Who you’re really serving when building tools in an era of AI delegation. Why prioritizing AX ensures your DX remains resilient and future-proof. How to measure and improve your repo/system AX with practical techniques like context tools, shared systems for teams, and measuring AX efficacy. The takeaway: to keep DX strong, you have to focus on AX. It’s not a question of supporting AI or agents in general, it’s a question of supporting the developers that use them.  Sean Roberts Main Stage – 52Atalk
- 16:3025mEdgeThe Landing (Ground floor)talkBeyond Tests: What to Verify in AI-Generated CodeAgent-based coding introduces new quality risks to teams that rely primarily on traditional testing approaches. Because agents default to ""best guesses"" when provided insufficient or underspecific context, these gaps—if left unchecked—can result in production issues related to performance, stability, or unexpected edge cases. However, when teams provide clear invariants and non-functional requirements coupled with review cycles that ensure they're met, agents can produce significantly higher quality code, reducing downstream maintenance costs. This talk presents an invariant-driven framework for deciding what to verify and where those checks belong in your pipeline. We'll introduce a simple invariant taxonomy that delivers immediate benefits through verification. The taxonomy is based on scope (universal across any repo, system/architecture-specific, or feature-level) and type of check (data contract, business logic, or performance/SLA), coupled with the target remediation (advisory only, block merge, or rewrite). We'll conclude with a before-and-after demo leveraging Tessl's Specification Registry that demonstrates the benefits of incorporating invariants into your agentic coding workflows. Attendees will leave with a practical checklist they can apply immediately.  Jennifer Sand  Brandy Pielech The Landing (Ground floor)talk
- 16:3025m[TBC] 
- 17:0530mEdgeMain Stage – 52AtalkCode as Commodity: how to thrive with the Medium of Generative AIAs the cost of producing code approaches zero, previously uneconomic ideas become not only possible but necessary to stand out in a sea of slop. Engineering will remain a vital craft, yet developers must confront the commoditization of what was once their specialty. The savviest will shift from production to articulation and orchestration—combining taste, judgment, and the ability to amplify signal as the noise deafens. Chris Messina—the product strategist who invented the hashtag, built early developer platforms at Google and Uber, and now coaches AI-native founders on go-to-market and launch strategy—explores how generative models are transforming software from exclusive craft to popular medium. As software becomes disposable, we must rethink what industry we’re in. The future developer looks less like a Big Tech tech bro and more like a chef, record producer, or architect: shaping experiences where taste, culture, and iteration amplify value.  Chris Messina Main Stage – 52Atalk
- 18:004hConference Party
- Select your package
- FreeOnlineJoin from anywhere and stream the highlights. - Live access to all Main Stage keynotes and talks
- Access to recorded sessions
- In-stream giveaways
 
- $180$350until October 31Full Access PassEnjoy DevCon in person. - Everything in Day Two-days of hands-on learning, conference sessions and networking.
- Breakfast pastries, lunch, tea & coffee
- Evening party with drinks and bites
- Exclusive event SWAG