Logo

Why 95%

of Agents

Fail

Reuven Cohen
Founder, Agentics Foundation
Back to podcasts

Can Agentic Engineering Really Deliver Enterprise-Grade Code?

with Reuven Cohen

Transcript

Chapters

Trailer and Introduction
[00:00:00]
The Evolution of Agentic Engineering
[00:02:06]
Challenges and Breakthroughs in Agentic Systems
[00:04:23]
The Spark Protocol and Test-Driven Development
[00:10:27]
Exploring the Hive Mind Approach
[00:33:54]
Introduction to Claude Flow
[00:38:30]
Developing with Claude Flow
[00:42:15]
Future of Agentic Engineering
[00:56:09]

In this episode

In this episode of AI Native Dev, host Simon Maple chats with Reuven “Ruv” Cohen, founder of the Agentics Foundation, to explore the breakthrough of agentic engineering and its impact on AI development. They discuss the evolution from deterministic scripts to autonomous systems that leverage recursion and feedback loops, highlighting how recent economic shifts have made long-horizon agentic swarms feasible. Discover how this new engineering discipline is reshaping the industry and creating a professional identity centered on verification and repeatable outcomes.

  • Agents, recursion, and a sudden flip in the cost curve: in this episode of AI Native Dev, host Simon Maple sits down with agentic engineer Reuven “Ruv” Cohen, founder of the Agentics Foundation and creator of Claude Flow, to unpack why 2025 feels like the year agents broke through. They cover how practical agentic systems differ from old-school automation, the breakthrough of recursive feedback loops, the economics that made long-horizon swarms feasible, and why a new professional identity, agentic engineer, is emerging.

From Deterministic Scripts to Agentic Engineering

Ruv traces agents back to their roots: early software “agents” were deterministic automations that operated within narrow, procedural bounds. They worked when the world behaved exactly as expected, and failed the moment reality deviated. The agentic shift began when large language models, starting around GPT‑3, enabled systems to use natural language alongside tools and infrastructure. That language layer unlocked flexible problem-solving, but the real catalyst wasn’t language alone.

The defining difference, Ruv argues, is agentic coding’s focus on autonomy with feedback. Agentic systems don’t just follow a chain of instructions, they operate in the environment, observe outcomes, and adapt through iterative loops with minimal human oversight. The agent doesn’t merely “do steps”; it uses the results of those steps, successes, errors, and logs, to decide what to do next. This changed the work itself, giving rise to a new practitioner: the agentic engineer.

Agentic engineering is not freeform “vibe coding.” It’s the deliberate design of architectures, processes, and controls that make outcomes repeatable and verifiable. The role centers on structuring context, tool use, and feedback so autonomy becomes reliable enough for production workflows.

Recursion, Feedback Loops, and Long‑Horizon Execution

The breakthrough Ruv emphasizes is recursion: feeding execution artifacts, compiler errors, stack traces, test failures, logs, and partial successes, back into the system. Early prompt-engineering patterns fixated on chain-of-thought, but the power came from closing the loop. Once the agent understands the context of what happened, not just the planned procedure, it can diagnose, repair, and continue. That’s what enables long-horizon behavior, where agents run for hours or even days to complete complex tasks.

Practically, that means instrumenting everything. Developers should capture stdout/stderr, diff outputs, test results, and any external tool responses, then persist that context and re-prompt the model with exactly what happened. Treat the feedback loop as first-class: budget and time constraints, retry policies, error classifiers, and structured error parsing all matter. The agent’s effectiveness is a product of how well you expose it to reality and how precisely you feed that reality back.

Trust isn’t the lever, verification is. Ruv is explicit: practitioners do not inherently trust model outputs. They build verification harnesses and guardrails because LLMs take the shortest path to an answer, which can mean “simulated” success. Engineers counter that with automated testing, environment sandboxes, deterministic tool interfaces, and explicit acceptance criteria. The model learns from hard, unambiguous signals, passing tests, zero runtime errors, validated API responses, fed back through the recursive loop.

The Economics Flip: Unlimited Tokens, Claude Code, and the SPARC Protocol

Agentic swarms, multiple coordinated agents tackling a problem, worked in prototypes but were economically brutal. Ruv’s team proved they could run agents for 36 hours, but even minimal operations cost around $4,000 per day. Scaling to concurrent swarms hit roughly $7,500 per hour for 10 agents. It was often cheaper to hire humans.

Then the cost curve flipped. In April, Anthropic’s Claude Code introduced “all-you-can-eat” style plans, effectively removing the hard token ceiling that had kept long-horizon recursion and swarms in the lab. Ruv and collaborators reimagined their SPARC protocol, which orchestrates recursion in agentic workflows, and suddenly hours-long, high-parallelism runs became feasible for a flat monthly fee, often starting near $20. Capability jumped while cost collapsed, an inflection you rarely see in software.

For developers, that unlocks new patterns: parallelize subtasks aggressively and use cross-verification among agents to lift quality; run longer horizons for complex refactors or multi-service changes; tolerate iterative retries because budget no longer melts on every loop. Still, put budgets and watchdogs in place: cap concurrent agents, enforce timeouts, and log token usage, so scale doesn’t surprise you.

Defining the Agentic Engineer: Beyond Vibe Coding

Alongside the technical shift, a professional identity is forming. The Agentics Foundation emerged from Ruv’s realisation, shared by a growing community, that they were practising a distinct engineering discipline. While “vibe coding” is great for ideation, agentic engineering prioritizes architectures and processes that are repeatable and outcome-driven. It’s about designing for autonomy with verification, not just prompting for inspiration.

The foundation also responds to AI-washing. Big vendors label chatbots as “agents,” diluting the term. The community’s view: if it doesn’t operate autonomously with tool use, feedback loops, and verifiable outcomes, it’s not an agent. The foundation operates as a member-led, meritocratic guild to articulate standards and protect practitioners’ interests as trillion-dollar companies shape the narrative.

Ruv’s own path underscores the moment. Early access to ChatGPT led him to ask a blunt question, how to become influential in AI, and then follow the model’s advice with remarkable consistency: build a focused community, post openly on GitHub, run weekly livecasts, let people buy your time. The outcome: 100k+ subreddit members, 100+ customers, including 20 Fortune 500s, largely “one guy and bots.” It’s a vivid example of agentic leverage in both code and career. Developers can plug into the community via agentics.org and the public Discord.

A Practical Playbook to Build Agentic Systems Today

Start with the loop. Build an orchestrator that: (1) plans a step, (2) executes via tools or code, (3) captures ground truth artifacts (errors, logs, test results), (4) re-prompts with that context, and (5) repeats until acceptance criteria are met or budgets are exhausted. Make error handling explicit, parse stack traces and compiler output into structured fields the model can reason about. Persist a transcript of attempts so each iteration sees the full context.

Design for verification, not trust. Run work in sandboxes with strict permissions. Use deterministic tool APIs (e.g., “run_tests”, “apply_patch”, “get_logs”) and check return codes. Gate progress behind tests or runtime checks. Detect “simulated success” by validating side effects, files really changed, services really deployed, endpoints really responded as expected. Treat passing tests and clean logs as the agent’s north star.

Exploit the new economics. When single-agent throughput stalls, shard the problem and spawn a small swarm, parallel agents exploring different solution paths or handling different files/components, then converge via a final reconciliation pass. Set hard caps: maximum iterations, maximum wall-clock time, and maximum concurrent agents. If you have access to models or IDE integrations with unlimited token plans (e.g., Claude Code), target long-horizon tasks like multi-file refactors or docstring and test generation at repository scale. Keep cost telemetry so you know when to roll back to a single agent or human-in-the-loop.

Key Takeaways

  • Recursion is the agentic unlock: feed real errors, logs, and outcomes back into the model so it can self-correct across long horizons.
  • Verification beats trust: rely on tests, runtime checks, and deterministic tool APIs. Guard against “simulated success.”
  • Cost changed the game: unlimited token plans let you run hours-long loops and small swarms for a flat fee, making parallelization practical.
  • Engineer, don’t vibe-code: design architectures and processes with defined outcomes. Agents ≠ chatbots.
  • Start small, scale wisely: cap iterations, time, and concurrency; log everything; converge parallel work with a final reconciliation pass.
  • Join the guild: the Agentics Foundation is a member-led community setting standards for agentic engineering. Explore agentics.org and the Discord to learn, share patterns, and find collaborators.


Chapters

Trailer and Introduction
[00:00:00]
The Evolution of Agentic Engineering
[00:02:06]
Challenges and Breakthroughs in Agentic Systems
[00:04:23]
The Spark Protocol and Test-Driven Development
[00:10:27]
Exploring the Hive Mind Approach
[00:33:54]
Introduction to Claude Flow
[00:38:30]
Developing with Claude Flow
[00:42:15]
Future of Agentic Engineering
[00:56:09]