
Spec-Driven
Development
With Kiro
Transforming Dev Practices with Kiro’s Spec-Driven Tools
Also available on
Transcript
[00:00:00] Simon: Hello, and welcome to another episode of the AI Native Dev. My name's Simon Maple, and I'm your host for today. Today’s going to be a fun episode. We're going to be talking about spec-driven development, but also looking at a pretty cool new tool on the market called Kiro.
[00:00:20] Simon: Kiro is—well, I’ll let the folks here introduce it—but it’s a spec-driven way of building software in your IDE, which is very interesting. We're going to talk about the philosophy of spec-driven development, show a demo of Kiro working, and then discuss the journey the team had at AWS to get here today.
[00:00:46] Simon: So, let me introduce Nikhil, Swaahan, and Richard. Hey folks, how are you doing?
[00:00:55] Richard: It’s going good. Thanks for having us on.
[00:00:56] Simon: Let’s jump into what your roles are at Kiro. Why don’t we start with you, Nikhil?
[00:01:02] Nikhil: Sure. I'm the product lead for Kiro. I’ve been on the team for the past six months or so, though it feels like I’ve been here for years at this point.
[00:01:12] Nikhil: My job is, first, to help build out the features we need, and second, to work with the team to make sure we’re doing the right thing.
[00:01:22] Simon: Awesome. And Kiro is around a year and a half in build time?
[00:01:30] Nikhil: Richard started before me, probably 10 to 12 months earlier. Richard, right?
[00:01:35] Richard: Yeah, close to that. Let’s see—it’s July—about 13 months actually. I’m Richard, an engineer on the Kiro team. As Nikhil mentioned, I was either the first or the second engineer on the team when we started.
[00:01:53] Richard: I’ve been involved in all phases of Kiro development. Specifically, I’ve focused most of my time on specs, including the original research projects with a lot of the Amazon science teams.
[00:02:09] Simon: Amazing. Looking forward to getting deeper into that.
[00:02:11] Simon: We’ve said Kiro a few times. My introduction probably didn’t do it justice. Why don’t we jump into what Kiro is? Nikhil, can you walk us through—give us the definition of the problem Kiro is trying to solve?
[00:02:25] Nikhil: The elevator pitch.
[00:02:26] Nikhil: Kiro is an agentic IDE that helps you do your best work. That’s our tagline. Let me unpack that a little bit, because it’s vague. Unless you’ve been living under a rock the past year and a half, agentic IDEs have become really popular.
[00:02:47] Nikhil: They allow developers to use natural language to interact with an agent to generate code for an application. What used to take days or weeks to build as a feature or prototype can now be built in hours. Agentic IDEs popularized the term we’re all familiar with now—“vibe coding”—which allows you to use natural language and chat with an agent to generate code for your application.
[00:03:20] Nikhil: I think of vibe coding as almost like ChatGPT for coders. You go back and forth with an agent to build features. While vibe coding has been great for getting people up and running quickly, the process completely bypasses traditional software development lifecycle best practices.
[00:03:43] Nikhil: As someone who’s worked on a team here at Amazon, and before that ran a startup, I’ve experienced both worlds. When building a feature, you often spend days, if not weeks, detailing requirements and designs before you start coding.
[00:04:04] Nikhil: Even as a solo developer or in a small startup, you move faster because there’s less coordination, but you still spend time thinking about what you’re building before you start. Vibe coding changed that completely. As a solo developer, I can build something in an hour, but after one session of vibe coding, I often find myself thinking, “What did I build?”
[00:04:33] Nikhil: What decisions did I make? Because you move so fast—you get a response, you don’t quite like it, you tweak it, and the back-and-forth with the agent becomes a stream of consciousness. It takes you a moment to reflect and think, “What did I actually get done?”
[00:04:53] Nikhil: That’s the core problem space we’re trying to solve. With Kiro, we’re not trying to build just another agent editor. We believe the way people work with AI will change, and spec-driven development is the approach we think will become the standard going forward.
[00:05:19] Simon: Really interesting. I love the way you describe how, very often, you make decisions quickly but those decisions stay in your mind rather than being captured. While much of it is represented as code, when you’re vibe coding, it’s hard to look back later and separate decisions from implementations.
[00:05:38] Simon: If someone else were to come in and look at your code in the future, they might guess what was a decision, what was meant to be, and what was just an assumption or an LLM choice. But it’s not clear—it’s a mixed reality in code. That’s what I love about specs. They pull that out and allow anyone reading the spec to understand what the developer or team was actually adding in.
[00:06:17] Nikhil: One analogy I use with the team is that vibe coding is almost like a Slack thread. It’s a stream of consciousness. When you work with teams, you often see 50–60 message-long Slack threads on complicated topics, and a lot of the decision log is captured there instead of in a document. With a document, you can review and align before implementing.
[00:06:51] Simon: Yeah, really interesting. Also, a Slack message is a point-in-time artifact. It’s relevant at that moment based on past context, not future context. Whereas a spec is the spec—it’s the final outcome.
[00:07:09] Simon: Similarly, when I’m prompting in a vibe coding environment, my prompt is relative to the code as it was at that time. Later, those prompts may not make sense because they only referenced that point in time.
[00:07:33] Simon: We’ve mentioned specs a few times. I’d love to go a little deeper. Richard, let me bring you in here. What do we mean by a specification?
[00:07:45] Richard: Okay, so that's a very deep question, depending on who you're talking to. Maybe I'll start with something that Nikhil was just talking about.
[00:07:55] Richard: We weren’t necessarily looking to build just another AI or IDE tool. We’re really trying to help people shape software development practices for the future. A lot of things in software engineering have existed since the 1940s, 50s, and 60s, and we’re just now starting to see them come into play.
[00:08:14] Richard: Specifications, as a term, often make people think of formal methods, especially in deeper systems. Usually that means safety properties or liveness properties of your system. For example: is it making progress toward a goal, or is it violating certain properties? That’s incredibly difficult to tell in code just by looking at it, especially in large programs or microservices.
[00:08:35] Richard: If I want to ensure I have a read-after-write property across a system, you can’t just glance at an IF statement. In complex programs you need to lift that up to a higher-level definition so people can actually see what it means for the system. Then, as you execute the program, you need to ensure that this invariant is respected.
[00:09:06] Richard: Traditionally, this has been a backwards process. You build implementations, but people often can’t reason about their systems very well. So you bring in PhD-level experts who know esoteric modeling languages—TLA, Dafny at AWS, P, or something similar. They work backwards from the system, building models for verification or validation.
[00:09:33] Richard: What we’re trying to do is make this approachable for any developer, not just specialists. That’s a top-level point. With AI tools, there’s a lot of talk about making programming available to people who don’t know how to code. That’s true, but we also want to give existing developers new tools to reason about systems in ways they couldn’t before.
[00:10:20] Richard: So what are specifications? My answer: they’re about behavior. They describe how a system should behave, not its implementation. Historically and philosophically, this has shown up in many ways. The most common is design by contract. In research, that’s usually expressed through Hoare logic: you define preconditions for a function, postconditions it must meet, and termination conditions or loop invariants that hold inductively. That enforces contracts for your functions.
[00:12:20] Richard: More recent work, like Lean from Amazon, is theorem-based. That gives you even lower-level fidelity because it ties types in your program directly to logic. Now, why does that matter for Kiro? Those formal languages give you powerful artifacts, but they’re inaccessible to most developers. Our approach is to surface those behaviors at a higher level so we can still meet those constraints as the system evolves.
[00:13:19] Richard: When you build with Kiro, you start with top-level requirements. This uses something called EAR syntax—Easy Approach to Requirements. It’s not just a clever trick with auto-formalization, it’s actually an extension of temporal logic, which itself is an extension of first-order logic. That gives us a lot of power: we can control model behavior, verify processes, and connect design to implementation.
[00:14:24] Richard: Philosophically, two points motivated us. A professor in the UK, Lawrence Trap, described them. First is the circular specification problem: you almost need to build something fully before you can specify it. Second is the observer effect: once you see the software, it changes how you and others think it should act.
[00:15:23] Richard: That’s why Kiro has both vibe mode and spec mode. You can prototype quickly, then specify it, and move back and forth. That interplay captures how humans actually work.
[00:15:41] Simon: Really curious. In software development today, even prototypes help us understand how something should be built. With iterative programming and agile methods, we’re always learning as we go. It makes sense. If you go back 20+ years, before Agile, specs made more sense in a waterfall world.
[00:16:01] Simon: Having both vibe and spec modes in Kiro is super powerful. Maybe we should jump into a demo so viewers can see it. For those listening on audio, you can watch the video version on YouTube or Spotify.
[00:16:57] Nikhil: Alright, happy to share my screen. Before diving in, let me show a diagram to orient you as a user.
[00:17:03] Nikhil: With Kiro, you start with an idea. In other agentic tools, you’d enter a chat window and say, “I want to build X,” and it generates code. With Kiro, you enter your idea and we generate three files: requirements, design, and tasks.
[00:18:16] Nikhil: The demo app I’ll use is a simple e-commerce site—think Etsy for crafts. In my last demo, I added a review system where users can comment and vote. I built that entirely via specs.
[00:19:03] Nikhil: Kiro itself is built on a fork of Code OSS, so we could fully control the experience. You’ll see tabs like “Specs,” which we’ve been discussing; “Hooks,” which let you watch code and react to changes; and “Steering,” which lets you define rules to guide the agent during vibe or spec interactions.
[00:20:55] Nikhil: Steering files are defined at the repository level. They’re just markdown files in a .kiro folder, so you can check them into source control. That way, teams can enforce rules—for example, UI styling requirements defined by marketing.
[00:22:01] Richard: Everything is composable. Hooks can trigger spec workflow stages, and steering can augment specs. For example, if a team wants to enforce TDD, you can create a steering file with TDD rules.
[00:22:40] Nikhil: Exactly. I can just type, “Augment my workflow to ensure TDD best practices,” and Kiro generates a new steering file to guide the agent.
[00:23:29] Simon: Awesome. That’s powerful—like turning a brief into a spec.
[00:23:35] Nikhil: Right. Spec and steering complement each other really well. Let’s walk through a spec now. My app currently has reviews and a cart, but the heart icon for favorites doesn’t work. Let’s build a spec for adding favorites.
[00:24:22] Nikhil: I’ll create a new spec: “Add the ability to favorite items.” Kiro generates a new favorites spec with a requirements.md file. These are markdown-based, with sensible defaults but fully customizable.
[00:25:21] Simon: So this is the requirements phase?
[00:25:24] Nikhil: Exactly. My vague idea was turned into a structured requirements file. Each requirement has a user story with acceptance criteria written in EAR format. This mirrors Amazon’s PR/FAQ style but captures details and edge cases that PR/FAQs often miss.
[00:26:22] Nikhil: The best way I’ve found to do this with engineering teams is to break things into user stories and acceptance criteria. The user story is almost like the high-level definition. For example: As a customer browsing products, I want to mark items as favorites so I can find them easily without searching again.
[00:26:40] Nikhil: That’s the traditional definition of a user story that everyone can understand. But when a PM gives that to an engineering team, there are always a lot of questions. Take the favorites use case. If I add something as a favorite, can a user click and unclick to remove it? Is there a count of how many favorites I have? Once you start building, you realize there are a lot of details and nuances in every feature.
[00:27:24] Nikhil: Here we’ve got the first requirement: marking items. The acceptance criteria in EAR format says, When a user views a product card, the system shall display a heart icon. Then there’s something around click/un-click. Next, I want to be able to view all my favorite items in one place. That’s the second requirement.
[00:27:49] Nikhil: An edge case is when favorites are empty. Then the system should display an appropriate empty state. Here I can edit it either directly or in chat. For requirement four, I might say, Make sure this experience has some iconography. That’s how teams iterate on requirements before moving ahead.
[00:28:24] Nikhil: The third user story: I want my favorites to persist across browser sessions. That wasn’t captured in the simple initial story. There could also be a favorite count indicator.
[00:28:53] Simon: When you just edited that, do you prefer the LLM to edit the spec, or do you do it yourself?
[00:29:03] Nikhil: Personally, I prefer the LLM to edit, but I can also add things manually. We have autocomplete, so as I type, it continues for me. My preference is to do what I just did, but sometimes I’ll use chat and say, I don’t care about the count indicator requirement. That’s almost like vibe coding the requirements.
[00:30:05] Nikhil: There’s still room for improvement here. For example, how do we make acceptance criteria easier to understand? How do we surface when two requirements have conflicting information, so users don’t miss it?
[00:30:23] Richard: In larger teams, multiple people might review 10, 15, 20 requirements as part of a bigger system. You’d want to compare them together to see if there are contradictions. That’s an area we’re exploring for the future.
[00:30:53] Nikhil: As you can see, it went ahead and removed the count requirement. This looks good. I’ll move to the design phase.
[00:31:07] Simon: So requirements are about the “what.” Is design more about the “how,” the architecture of the application?
[00:31:23] Nikhil: Exactly. Requirements are the “what,” design is the “how.” Kiro looks at the existing codebase to determine what the design needs to be. It generates a design markdown file based on the requirements.
[00:32:08] Nikhil: Internally, we dogfood Kiro heavily. When we build a new feature, we generate requirements and design directly in the codebase, then use those artifacts with the team to decide if they look good. A lot of time in teams is spent ironing out design and requirements details.
[00:32:51] Nikhil: Typically we also generate a system diagram. It shows architecture, state management patterns, core components, and interfaces. For this case, it adds a Favorites context type, add/remove favorites, new components, and a data model update. It also covers error handling and testing strategies.
[00:33:34] Nikhil: I can edit these. For example, if I want to focus on unit testing instead of integration testing, I can adjust it. The idea is this is the starting point. You review with your team, note trade-offs, and decide how to proceed.
[00:34:35] Simon: What happens when there are conflicts between design and requirements?
[00:34:40] Nikhil: Right now they’re manually resolved. Typically, design reflects requirements. If I add back a requirement we removed earlier, the design will update accordingly.
[00:35:17] Richard: Good point. The process uses requirements, context engines, and backend tools in the Kiro API to carry everything forward. Requirements feed into design, then tasks. Customers could technically create a design that violates requirements, and it would still proceed. That’s intentional.
[00:36:04] Richard: Rejecting it would be a bad developer experience. In practice, developers often have domain knowledge that’s hard to encode in a document. Forcing rigid structures creates friction. We chose flexibility, letting customers override when necessary.
[00:36:55] Nikhil: Let’s get to the final stage: tasks. We’ve got a requirements file, a design file, and now a task list. The task list is the implementation plan—discrete items a developer can execute against.
[00:37:22] Nikhil: If I were vibe coding, I’d start with a prompt and iterate. Now, instead, I have a tasks.md file breaking the design into tasks: create Favorites context, add Favorites view, create a list component, enhance components, handle accessibility, error cases, etc. It’s very thorough—things like accessibility I might not think of otherwise.
[00:38:22] Simon: How much of these tasks are for the LLM versus the user?
[00:38:44] Nikhil: Mostly the LLM. The structure helps it implement tasks completely. That’s been the experience of early testers. But because we index for flexibility, I can edit the task list—remove items or adjust focus areas.
[00:39:31] Nikhil: I’ll finalize the task list. Each task has a “Start Task” button. We intentionally didn’t add a “Play All” button because incremental building with editors helps users understand what’s happening.
[00:40:15] Nikhil: When I start a task, the UI updates and the system implements it. I can see all changes being made—mock storage, test components, infrastructure for Favorites.
[00:40:59] Simon: I’d love to see an LLM orchestrate this automatically, acting as judge and executor.
[00:41:21] Nikhil: Absolutely. That’s a feature request we’ve heard. Basically, can I offload the whole task list to the LLM in an async workflow? That’s what you’re suggesting?
[00:41:45] Simon: Exactly.
[00:41:49] Nikhil: Right now it’s installing Jest and setting up testing because I added the TDD rule in steering. While that runs, let me explain state at the end. Once a task completes, I can view the actual changes with full context.
[00:42:51] Richard: You’ll also notice requirements tracing here. It’s important across the software development lifecycle. It documents and tracks relationships across the state machine, including as we move tasks through the implementation plan.
[00:43:13] Simon: If I change a requirement, will it look into design and task lists, find dependencies, and only update what’s affected?
[00:43:33] Nikhil: Exactly. Let’s try that. Say I add back a requirement for counting favorites. I don’t need to define it perfectly; I just hit refine.
[00:44:19] Nikhil: Oh, I see why it’s queued—I have a hook running. Hooks watch your repository for events: file create, save, delete, or manual trigger. For example, I have one that updates documentation whenever new TypeScript files are added. That’s what’s happening here.
[00:45:32] Nikhil: Another hook example: component validator. When new React components are added, it checks that they’re well-defined and not handling multiple concerns. Teams can enforce practices like this across workflows.
[00:46:03] Richard: My favorite feature is the context LSP. Experienced engineers might look at a design doc and want to adjust an interface or data structure based on domain knowledge. We let them.
[00:46:51] Richard: For example, in a markdown file, you can use a pound sign to reference files directly. That lets you reuse designs across projects—like a library import. Kiro respects those references as it builds implementation plans.
[00:47:32] Richard: One example I use is property-based testing. It’s more advanced than unit or integration tests, but less rigid than proofs. I’ll create markdown files for property testing in Python, for instance with Hypothesis, and inject them into designs.
[00:47:55] Richard: And then I can say what the properties are, right? These are the properties that I want to hold, which humans are really good at. Then the model will actually quantify that into code. When you run your tests, you get CPUs that execute with fuzzing and so forth.
[00:48:11] Richard: So this is a super powerful feature that encouraged a lot of people, as they’re building advanced systems, to take full advantage of it.
[00:48:19] Simon: And when people get through implementation and suddenly think, oh, actually I’d love to change this thing… let’s say it’s an implementation detail. Of course, when we do that, code becomes the source of truth versus the spec. Then how do we regenerate and so forth? How does that cause problems, and what’s the best way of avoiding developers touching code?
[00:48:49] Richard: I would say we don’t really see that as a problem if they want to touch code. Now, it is possible the specification can drift a little from where the code is. I’d say that space is going to evolve in the future, and we’ve got some ideas about how to handle it more robustly.
[00:49:11] Richard: But I wouldn’t discourage engineers from editing code. In fact, if you think about it, we’re in a way encouraging them by offering both the spec and vibe mode. If you’re an expert and you know the targeted changes you want to make, by all means go ahead and vibe those small pieces.
[00:49:28] Simon: Right. So a spec then, if we think about lifecycles—how long is the lifecycle of a spec? Is it the lifecycle of the change versus the lifecycle of the app if people want to change code?
[00:49:45] Richard: Yeah, this is a tough question. My common answer is to think about it like comments in code. If you put comments in code at that point in time, they’re temporal, and they will hold as people and teams edit them over time. They can drift, that is true. So you have to think about it as a similar artifact in your system and build controls around it.
[00:50:11] Simon: One thing we haven’t talked about on this screen is MCP servers, down at the bottom left. What’s the best way of making the most out of MCP servers or other services from within Kiro?
[00:50:25] Nikhil: Yeah, MCP servers work with all the features. Whether you’re vibing or speccing, MCP servers will work. In this case, I have the Fetch MCP server, which allows me to do a web search.
[00:50:38] Nikhil: So when I’m building a feature, especially in the design phase, I can say, hey, can you search the web for best practices regarding maybe a framework or a new technology? You can do that. We have full local MCP support.
[00:50:56] Nikhil: You can add any MCP server via a JSON file. You can do that with chat or just update it yourself. Then those MCP servers get invoked by the agent, and the agent can decide whether, in spec mode or vibe mode, the agent needs to be invoked.
[00:51:18] Simon: Very interesting. For example, docs are a good case. If there was a docs vendor someone preferred, you could get Kiro to invoke that spec provider and say, when you want specs, here’s the provider.
[00:51:35] Simon: In terms of LLMs choosing those MCP services versus doing it themselves, is there contention between when it goes to the MCP service versus doing it manually?
[00:51:46] Nikhil: Sorry, I didn’t fully grasp the question—could you just say it again?
[00:51:51] Simon: Yeah, absolutely. For example, if we provide a docs creation MCP server—of course, the LLM can create docs by itself too. Do you find challenges when the user wants to use an MCP server, but the LLM tries to do it itself because it has that capability? Do you ever see contention there?
[00:52:19] Nikhil: Basically you’re saying if there’s an MCP server overlapping with one of your hooks’ functionality?
[00:52:25] Simon: Yeah.
[00:52:26] Simon: Or even just another basic LLM capability.
[00:52:29] Nikhil: Absolutely, you can see that happen. But the way the system is built, it’s flexible enough for you to interrupt. When working with agents, you’re in systems that are non-deterministic in nature. From a UX perspective, you need the right level of flexibility so users can interrupt the agent.
[00:52:47] Nikhil: For example, I could have interrupted the agent while it was going and said, hey, you need to do this instead. That’s all possible with Kiro.
[00:53:04] Simon: Yeah, and I presume you could even add it into the task itself—say, I want you to use this service or tool to create documentation for this within the hooks.
[00:53:20] Simon: Okay, let’s go back to specs a bit more. Richard, I guess the specs you showed—which are requirements, design, tasks, all of that together—are the specification in the Kiro world.
[00:53:52] Simon: In terms of where we are today versus when you started, I suspect you learned a ton going through private betas and using it internally at AWS. What are some of the key learnings you took from that journey of working with specifications?
[00:53:58] Richard: A few things. You nailed it—specifications are the combination of those artifacts in Kiro. High level, they’re about behavior. Our view, and a key learning, is they need to be composable documents with functional business use cases, pseudocode, technical diagrams, behaviors around error handling, and properties.
[00:54:29] Richard: Some learnings there: when we kicked off this podcast, I mentioned building the Kiro product and also a research phase that ran in parallel and then joined up. Let’s talk about how that went, especially what we haven’t mentioned.
[00:54:57] Richard: There were a few iterations of spec internally that were big failures. So we proved what wouldn’t work too. For example, Amazon has this automated reasoning group that’s well known, and the formal methods community dealt with many of the same things. In fact, they influenced how we built this product.
[00:55:20] Richard: Early days at Amazon, specifications were heavily used in systems backing IAM, S3, and so forth. My biggest learning from talking to those teams was how much back and forth there was between people and engineers. You had experts on implementation and experts on modeling behaviors, and they had to sync constantly.
[00:55:43] Richard: I remember one presentation internally showing paths between offices like a triangle, and how the carpet was worn down. You’d define a behavior, then walk to the implementation expert, then back to the modeling expert. That’s really what we’re doing here—allowing you to iterate and go back on the system.
[00:56:25] Richard: Without sounding too boring, a lot of this is codified institutional knowledge and SDLC practices. But let’s talk about some of the things we tried while building this. We could have said, you know what, let’s reinvent the space—even though people have been building software for decades—and force certain patterns down people’s throats.
[00:56:43] Richard: One thing we tried at first was making every spec go through a TDD flow. Nikhil, do you want to talk about that?
[00:56:54] Nikhil: Yeah, I can talk about that and what we did next. The approach was, let’s start with TDD. TDD is great: you start with tests and then generate code from them. From early tests, we found it worked pretty well. But when we put it in front of users and customers, it wasn’t how people typically worked.
[00:57:23] Nikhil: TDD is great, and some follow it, but a lot of customers just weren’t doing that. It enforced rigidity in how people used specs. Feedback was, this isn’t really working for me.
[00:57:48] Nikhil: The next major thing we tried was generating a spec by right-clicking a file or folder and creating a new spec for what’s inside. That was interesting, but users found it just generated information about what was there. That’s kind of like what we have with steering, so the question was, how do I actually action on this and build something new?
[00:58:38] Nikhil: Finally, where we landed combined Richard’s years of work on requirements and designs with what we saw in the field. Power users had changed how they worked with agents—more planning and meticulous thinking upfront before tasks. We learned from that and applied it with a user experience centered on the development process: requirements, design, and implementation.
[00:59:22] Nikhil: The moment we shipped this, we got a lot of positive signal. Of course there were tons of issues at first, and what you’re seeing now is four iterations in. But the core is around requirements, design, and implementation planning.
[00:59:46] Nikhil: A lot of the questions you asked—how do I edit this, keep it in sync, review it—are things we’re actively thinking about, not just from a technology perspective but also user experience. How do we make this palatable to users?
[01:00:10] Simon: And when we say users, there are going to be multiple personas going forward. Some will have more opinion on requirements, others on tasks or design. Do you see different owners for these files as part of the spec?
[01:00:46] Nikhil: Absolutely. That’s why we’re dogfooding this internally. PMs own requirements, engineers own design. A big feature request we’ve been hearing is: we have requirements in Jira or Asana—how do we bring those in and surface them? That’s interesting from a problem perspective.
[01:01:16] Nikhil: How do we make coordination better? How do we save upfront time that goes into requirements and design planning in a way that makes sense for how teams operate? We don’t expect PMs on every team to be inside a code editor. We have to look at how people are doing things in the wild and enable them in the tools they’re already using.
[01:01:52] Simon: Absolutely. Let’s look a little further forward. Where does this space lead? What’s the next step from a spec-oriented perspective? How do you see people in future wanting to use specifications?
[01:02:14] Nikhil: Richard, do you want to take that first?
[01:02:16] Richard: Yeah, I’ll start and then hand back to Nikhil.
[01:02:26] Richard: That you do with a lot of teams and products. There are topics around that which he could talk a lot about. The one that's most interesting to me is that it shifts your focus from the post-building phase into the code review phase.
[01:02:53] Richard: Agents and models now generate a lot of code bases and lines of code. They're getting quite good at it, no matter what the AI detractors say. They're meeting higher standards, especially on many benchmarks. Now you’ve got these code bases and requirements, and you have teams that need to reason about components of them.
[01:03:19] Richard: That could be reviewing or reasoning about refactoring parts of a program, or reviewing and reasoning about new feature components, or even security implications of code changes. This is going to be a highlighted area, not just in Kiro, but across the industry over the next couple of years. Right now, it’s largely untouched in the AI and editor space.
[01:04:24] Nikhil: For me, I think about the problems teams face across the SDLC process. Upfront, there’s a lot of back and forth finalizing requirements and designs. When multiple teams work together, it gets even harder. How do we improve that process?
[01:04:47] Nikhil: Going further, once something ships and issues arise, how do we bring those issues back to the development team? We want to look at the full lifecycle. Some core things won’t change regardless of AI—when you ship software, there will be problems. How do you fix them? Requirements will be ambiguous. How do you make that easier? That’s how we’re approaching this space.
[01:05:37] Simon: Awesome. Where are we with the build right now? Looks like we’ve just completed section four?
[01:05:46] Nikhil: Yes, we’ve completed a bunch of tasks. One thing you see is that it’s being very methodical with generating tests and updates. For example, I just executed a card for the enhanced product card component with favorite functionality. It’s making sure the tests are set up correctly.
[01:06:23] Nikhil: It added a favorite option already, so I can mark things as favorites. They’re showing up correctly as favorites.
[01:06:31] Richard: Remember this one had a steering file to use TDD first, so it actually augmented his flow to do some of these upfront rather than later.
[01:06:40] Nikhil: Exactly. I picked a relatively easy feature. Often, AI demos show building a game or a greenfield app. The reason I chose favorites is to show that even a simple feature involves a lot. You need tests, accessibility, and scenarios you wouldn’t normally think of.
[01:07:14] Simon: And the nice thing is the tests are built as part of the flow. Even if they would’ve been added later without TDD, this ensures validation along the way. The weakness with vibe coding is validating and ensuring you haven’t regressed other things. Here, tests come by default.
[01:07:47] Nikhil: Absolutely. Another example: I had to update the README file. Many developers want that kept up to date. The hooks experience is one of my favorites—you can enforce things like updating docs, localization, or string handling across languages whenever new features are added. Customers are getting creative with hooks.
[01:08:29] Simon: That also avoids the debt that usually follows when you skip these steps.
[01:08:35] Nikhil: Exactly.
[01:08:36] Simon: Amazing. This has been a great demo and a really fun conversation. Congratulations on the success Kiro has had. From what I saw on Twitter, it’s on a waiting list right now because of the high demand. What advice would you give people who want to try it today?
[01:09:01] Nikhil: We’re so grateful. Thousands of users are trying to get on, and we’re optimizing the experience for those already in. That’s why we added the waitlist. We’re working diligently to get people off it, and you’ll be hearing from us soon.
[01:09:23] Simon: Awesome. And if you’re interested, it’s kiro.dev, right?
[01:09:27] Nikhil: Yes, kiro.dev.
[01:09:28] Simon: You can join the waitlist there. Wonderful. Nikhil and Richard, it’s been an absolute pleasure. Very insightful. Thank you for sharing your knowledge and demo.
[01:09:43] Simon: For those listening only on the podcast, do check out the YouTube video—the demo has a ton of value. Really appreciate you both joining.
[01:09:54] Nikhil: Thanks, Simon.
[01:09:56] Simon: And thanks for listening. Tune in to the next episode.
Chapters
In this episode
In this episode of AI Native Dev, host Simon Maple is joined by Nikhil Swaminathan and Richard Threlkeld to discuss Kiro, an agentic IDE designed for spec-driven development. They delve into how Kiro emphasizes specifications as the core artifact in AI-era software, enabling developers to maintain clarity and intent amidst rapid code generation. Discover how this approach improves team coordination, enhances code quality, and ensures that behavior-first design becomes the foundation for sustainable software development.
Rapid “vibe coding” with agentic IDEs lets you ship features in hours, but it also leaves decision-making buried in chat logs and diffs. In this episode of AI Native Dev, host Simon Maple talks with Nikhil Swaminathan (Product Lead) and Richard Threlkeld (Engineer) about Kiro, an agentic IDE built around spec-driven development. They explore why specifications should become the primary artifact of AI-era software development, how Kiro operationalizes behavior-first design, and what practical workflows and lessons they’ve learned after a year of building and research with Amazon science teams.
From Vibe Coding to Spec-Driven Development
The team draws a sharp contrast between vibe coding—rapid, conversational code generation—and the discipline required to build maintainable systems. Vibe coding’s back-and-forth is fast but ephemeral; prompts are point-in-time and tied to transient context. After a flurry of iterations, even solo developers may step back and ask, “What did we build? Why did we choose this?” The rationale and constraints live in a chat stream of consciousness rather than a durable artifact.
Spec-driven development flips that script by making behavior the first-class concern. Instead of using prompts to chase implementations, you define what the system should do, the properties it must uphold, and how it should handle errors. The spec becomes the stable north star that survives code churn. As Simon notes, it’s the difference between a long Slack thread and a shared document everyone actually reviews, aligns on, and treats as the source of truth.
What Kiro Is: An Agentic IDE Built Around Behavior
Kiro’s tagline, an agentic IDE that helps you do your best work, comes with a specific opinionated workflow: center development on specifications. Kiro still gives you the speed of natural-language code generation, but its agent uses specs as the grounding reference when building and evolving code. That way, decisions, assumptions, and constraints are captured explicitly and carried forward rather than lost in chat.
Practically, Kiro turns specs into a living, composable artifact linked to code. The agent can generate implementations, create tests aligned to properties, and produce diffs that map back to spec sections. The goal isn’t “yet another AI editor,” but a way to preserve intent, improve team coordination, and make behavior visible, so you can move fast without erasing the why behind the what.
Specifications Demystified: Behaviors, Properties, and Invariants
Richard frames specifications as descriptions of behavior, not implementation. In formal methods, you might hear about safety properties (what must never happen) and liveness properties (what must eventually happen), or invariants that should hold across system states. For example, a read-after-write consistency guarantee across microservices is difficult to verify by skimming code; it needs to be stated, reasoned about, and checked at a higher level.
Traditional tooling for this includes TLA+, Dafny, and P, or theorem provers like Lean. You might also encounter “design by contract,” where each function declares preconditions, postconditions, and loop invariants. These techniques are powerful but historically gated behind specialized expertise. Kiro’s contribution is to make behavior-centric thinking approachable for everyday development, using specs that mix natural language with structured constraints and machine-checkable elements—then letting the agent generate code, tests, and checks that honor those behaviors.
Composable, Living Specs: What to Put In Them and Why It Matters
A major learning from the Kiro team: specs must be composable documents that blend business and technical context. A useful spec includes real business scenarios and acceptance criteria, interface definitions and data contracts, pseudocode for tricky algorithms, and notes on performance or SLAs. Because they’re modular, you can compose system-level properties from service-level ones and reuse patterns across features and teams.
Specs also benefit from visuals and operational rules. Include sequence or state diagrams (Mermaid/PlantUML works well), define error-handling behavior (idempotency, retries, compensation, timeouts), call out constraints (rate limits, concurrency ceilings), and declare properties/invariants (e.g., “Order status never moves backward,” “Payment cannot be captured after T+7 days”). These become testable and enforceable. Even if you’re not using a full formal model, you can align on terminology and behaviors, then let the agent and CI translate them into tests and checks.
Just as importantly, treat the spec as a versioned artifact in Git. Kiro connects spec sections to generated code and tests, so PRs can be anchored to spec changes. Decision logs live in the spec, not lost in chat. That means onboarding becomes faster, reviews become clearer, and refactors remain aligned to the original intent.
A Practical Workflow in Kiro
The day-to-day flow looks like this:
- Draft a spec with scenarios, interfaces, and properties. Start simple—capture the core behavior and a few key invariants.
- Ask Kiro to generate or modify code based on the spec. The agent proposes file diffs tied to spec sections.
- Generate tests—unit, integration, and property-based tests that exercise the declared behaviors. For instance, if you specified read-after-write, Kiro can scaffold tests that assert that behavior under concurrency.
- Run and iterate. When the agent or you propose implementation changes, update the spec and re-generate as needed. Kiro keeps the spec and code in sync.
- Review in context. PRs show how diffs satisfy or impact specific spec elements, keeping reviewers focused on behavior rather than just syntax.
Teams can take this further by gating merges on spec coverage and property tests in CI. Over time, you’ll accumulate reusable spec components—common error-handling policies, API patterns, or cross-cutting invariants—that help enforce consistency and reduce redundant reasoning across services.
The Road Ahead: Research to Real-World, and How to Start
Kiro grew out of early research collaborations with Amazon science teams, but its mission is pragmatic: bring the power of behavior-first engineering to everyday developers. Rather than requiring everyone to master esoteric modeling languages, Kiro’s specs are approachable documents that still allow you to encode properties in a way agents and CI can act on.
To get started, pick a feature—not a monolith—and write a one-page spec. Name the core behavior, list 2–3 acceptance scenarios, define inputs/outputs and a few invariants, and specify error-handling. Generate a minimal implementation and tests, then iterate. You’ll quickly discover that the spec becomes a reusable asset that reduces ambiguity, keeps AI-generated code honest, and makes team communication smoother. As the team’s comfort grows, you can gradually add more formal properties, richer diagrams, and stronger CI gates.