We stress‑test coding agents claude code and codex at scale and report the patterns that actually survive. Across thousands of runs on a representative golden set of agentic coding issues, we compare orchestration (single vs. parallel vs. lightweight hierarchy), reasoning styles (ReAct, Reflexion, Self‑Refine, Least‑to‑Most), and context practices (refresh, compaction, dedup). The core move is turning ephemeral prompt tweaks into durable, versioned central and per-componet AGENTS.md so improvements persist across repos and projects. We’ll augment this with a GitHub study of AGENTS.md in popular projects (adoption, typical sections, section sizes), then show how we applied the findings to Claude Code and Codex to stabilize outcomes under load. Attendees leave with defaults that improved speed, cost, size, and performance and a template you can adpot immediately.
20‑minute run‑of‑show
- 2’ Why results drift and why rules beat one‑off prompts
- 4’ Orchestration ladders and reasoning styles (what held up at scale)
- 5’ AGENTS.md in practice, central vs. distributed, ordering, decision criteria
- 5’ Context engineering that sticks and trace‑driven updates
- 4’ “In the wild” snapshot (GitHub stats) and quickstart templates
Top‑3 takeaways
1. A reproducible template to convert traces into AGENTS.md rules that survive across runs.
2. When to use parallel runs vs. light hierarchy and how to stage reflection without ballooning tokens.
3. Context defaults that reduce cost and latency without cratering quality.
Thomas Krier is an AI engineer and entrepreneur with 20+ years of experience building data- and AI-driven systems. As CEO of Krier Intelligence, he develops multi-agent systems for competitive intelligence featuring advanced RAG pipelines, computer use agents, and universal crawlers. His work spans from pioneering dynamic pricing in insurance to creating AI-powered recruitment solutions at Umynd with intelligent agents and tools. A strong advocate for high-quality data engineering, Thomas focuses on building "agents that build agents" and competitive evaluation frameworks. As founder, architect and engineer of data and AI-driven companies, he has developed production-ready data pipelines, AI and agent based systems across diverse industries.