AI in software development is moving beyond just writing code. The next frontier is making sure that code actually does what it’s supposed to, including enabling AI to validate its own work against clearly defined requirements.
With that in mind, AI Native Dev last week reported on Amazon’s move to bring custom agents into its Q Developer CLI, allowing developers to spin up purpose-built versions of Q Developer (Amazon’s AI coding assistant) scoped to particular tasks. What makes that especially powerful is how those agents can plug directly into Kiro, an agentic IDE AWS debuted in July that turns natural-language prompts into structured specifications like requirements, design notes, and implementation guides, then uses them to steer the development process (listen to the AI Naive Dev podcast all about Kiro).
Now, the folks at AWS are showcasing how Q CLI agents can be guided by those very specifications, using Kiro’s structured docs as both blueprint and test plan. Instead of a developer telling an agent what to check, the spec itself defines what should be validated.
AI goes from code generation to self-validation
Massimo Re Ferrè, a product management director at AWS, introduced a hands-on experiment that did just that.
“I often spend time to check manually if the result of my prompt leads to good results… why should I (or someone in QA) follow that checklist, instead of having ‘an AI’ go through it, and report back with its findings?,” Re Ferrè wrote. “Enter Amazon Q CLI.”
Instead of manual QA, Re Ferrè crafted a custom Q CLI agent equipped with Playwright (for UI automation) and Fetch (for API testing), exposed as MCP servers. He defined it in a simple JSON config — giving the agent access to those tools, pointing it at Kiro’s requirements.md
, and restricting its permissions so it could read files and run checks but not alter the codebase. From there, the agent navigated the generated application and validated each acceptance criterion, producing a structured report of passes, failures, and partial matches.
{ "$schema": "https://raw.githubusercontent.com/aws/amazon-q-developer-cli/refs/heads/main/schemas/agent-v1.json", "name": "kiroqa", "description": "An agent to QA Kiro specs requirements", "mcpServers": { "fetch": { "command": "uvx", "args": \["mcp-server-fetch"\] }, "playwright": { "command": "npx", "args": \["@playwright/mcp@latest"\] } }, "allowedTools": \["fs\_read", "execute\_bash", "@fetch", "@playwright"\], "resources": \["file://requirements.md"\] }
Next, Re Ferrè extended the setup with a companion markdown file, test_requirements_prompt.md
, which laid out what the agent should do during its QA run. Rather than hand-craft it, he prompted Q CLI to generate the script, then iterated between manual tweaks and further AI refinements. With that in place, he ran his go-to “litmus test” (a small Flask voting app spec) and let the agent automatically check each acceptance criterion, outputting a structured report of passes, failures, and partial matches.
A proof-of-concept with promise
Re Ferrè is careful to frame the exercise as a proof-of-concept, rather than a production-ready setup. The experiment shows how agents can validate Kiro’s specs automatically, but he also emphasizes the importance of keeping them constrained and under human control. And while his conclusions are worth noting, they also come with the caveat that Re Ferrè is an AWS employee, so he is incentivized to present the company’s tools favorably. But of course, anyone is free to take the same approach and test it for themselves.
Nonetheless, the experiment demonstrates a practical loop where AI not only builds software but also validates it against the specifications that defined it. It’s a glimpse at a future where QA is continuous, automated, and spec-driven by design.