Day 2•2:35 PM•25m

From Prompts to AGENTS.md: What Survives Across Thousands of Runs

We stress‑test coding agents claude code and codex at scale and report the patterns that actually survive. Across thousands of runs on a representative golden set of agentic coding issues, we compare orchestration (single vs. parallel vs. lightweight hierarchy), reasoning styles (ReAct, Reflexion, Self‑Refine, Least‑to‑Most), and context practices (refresh, compaction, dedup). The core move is turning ephemeral prompt tweaks into durable, versioned central and per-componet AGENTS.md so improvements persist across repos and projects. We’ll augment this with a GitHub study of AGENTS.md in popular projects (adoption, typical sections, section sizes), then show how we applied the findings to Claude Code and Codex to stabilize outcomes under load. Attendees leave with defaults that improved speed, cost, size, and performance and a template you can adpot immediately.

20‑minute run‑of‑show
- 2’ Why results drift and why rules beat one‑off prompts
- 4’ Orchestration ladders and reasoning styles (what held up at scale)
- 5’ AGENTS.md in practice, central vs. distributed, ordering, decision criteria
- 5’ Context engineering that sticks and trace‑driven updates
- 4’ “In the wild” snapshot (GitHub stats) and quickstart templates

Top‑3 takeaways
1. A reproducible template to convert traces into AGENTS.md rules that survive across runs.
2. When to use parallel runs vs. light hierarchy and how to stage reflection without ballooning tokens.
3. Context defaults that reduce cost and latency without cratering quality.

Edge

The Landing (Ground floor)

talk

Krier Intelligence

Thomas Krier

Founder & CEO