
Next Wave?
Disposable
Software
AI-First Project Management for Developers
Also available on
Transcript
[00:02:02] Simon: Hello and welcome to another episode of the AI Native Dev. And you may hear in the background, there's a bit of noise. We are at Devoxx Belgium. And joining me is Alex Gavrilescu. How was my, how was my pronunciation? We were saying just off air, it was a very British pronunciation.
[00:02:20] Alex: Yeah, this is the most British,
Simon: Oh, is it?
[00:02:21] Alex: pronunciation of my family name, I’ve ever heard. I like it.
[00:02:24] Simon: We'll take that, we'll take that. So, Alex and I are both at Devoxx Belgium as I mentioned. And by the way about, probably about three, four weeks ago we released an episode with Stephan, who is the Chief Head, God of Devoxx.
[00:02:42] Simon: And I am happy to say his app is working perfectly. If you did watch that session, it was a very much a vibe coded application. And he was a little bit worried. We were a little bit worried, like, oh my gosh. This is, it's very vibe coded with specifications, but there's only one real test, and that's at a conference at Devoxx to test it out, and it works.
[00:03:01] Simon: It's perfect. It's working very, very well. So well done, Stephanan. So, yeah, we're talking a lot about spec-driven development, here at Devoxx and, and, Alex, first of all, tell us a little bit about yourself. 'Cause I think some of the things that we're gonna talk about with backlog and things like that is very much, is very much kinda in, you know, built in a, in a spec driven mindset.
[00:03:42] Simon: Yeah. In terms of going forward. So tell us a little bit about yourself, a little bit about Backlog.md.
[00:03:48] Alex: Right, so I'm a lead engineer in Vienna for a gaming company. And I'm coming from an agile environment to using Scrum, having all the requirements with PRDs. Thinking ahead a few months ahead with all of the features, like really thinking them through.
[00:04:07] Alex: And then there's the coding part. And right now everything happens like, between humans. So it's, no AI, no nothing, just classic development. We have a backend team. We have a mobile development team. So they all collaborate together. This is what I'm doing during the day, but obviously in my free time I'm trying to learn as much as possible about AI and trying to keep up to date with the latest trends.
[00:04:37] Alex: And Backlog.md was basically my trial of AI and like a challenge for myself to really try to be as autonomous as possible and try to let AI code 100% of the time in my tasks. So, what was the reason why I started building Backlog.md is, I started with side projects. I started working with AI.
[00:05:05] Alex: I was just prompting, prompting all the time, and I was just getting really bad results. I was basically doing web coding, and then I realized how successful processes I use at work make the collaboration between humans effective enough to build successful gaming features.
[00:05:27] Alex: And then I tried to learn from the human process and try to adapt it to AI. By doing this, I realized the most important part of the specs. So everything starts with specifications, requirements, not just about the feature you want to build, but also everything around it, like security specifications, CICD specifications, like just the language that you should use, like C#, TypeScript, Java, whatever language.
[00:05:56] Alex: And you should have all of this context right before you start. So in order to be really successful with your tasks, really…
[00:06:06] Simon: Really interesting. Let's just break that down a little bit. Yeah. So, you were doing a lot of vibe coding. What typical tools were you using for vibe coding?
[00:06:13] Alex: Claude MD and Claude.
[00:06:18] Simon: Yeah. So, what are the examples of those problems? Did you kind of hit in Claude code?
[00:06:30] Alex: So the problems were like single-task problems that would repeat themselves on every single task. So what I want to mean by this is, I tell the agent to build a certain feature and he reached the goal and he managed to build this feature.
[00:06:47] Alex: But by going back and forth on a lot of things. And with the next stuff, I would repeat again the same instructions, and I would stumble upon the same issues again and again and again. Gotcha. This is the problem with vibe coding. Basically, every time you have a new session with your agent and you try to build as much as possible within your context window.
[00:07:11] Alex: And this doesn't scale, obviously.
[00:07:13] Simon: Yeah. And as soon as you close that window, context is lost. You have to build that up again in your next window that you created.
[00:07:21] Alex: Exactly. And maybe you forgot the instructions that you told him to prevent certain issues. And you have again the same problems.
[00:07:27] Simon: Yeah.
[00:07:28] Simon: And a lot of the things that you mentioned that you would put into specs, like security considerations and things like that. How would you describe those needs to your vibe coding tool?
[00:07:38] Alex: Yeah, so obviously with Vibe coding, you can do a lot of damage. You can deploy some changes, some patches to production that actually break things.
[00:07:51] Alex: And you need guide rails, but you need these guide rails also with humans, like we all use specifications about security and measures that check points, for example, like staging environment to test all of these measures to prevent issues. And with vibe coding, suddenly everyone forgot about them.
[00:08:14] Alex: Yeah.
[00:08:15] Simon: So then you mentioned you moved all into the spec environment. Tell us a little bit about that.
[00:08:21] Alex: Yes. So I started creating markdown tasks manually. Actually, before that I started creating a huge markdown with all of the specifications and all of the features that I wanted to build.
[00:08:32] Alex: The problem with that, it was a huge context and I would immediately inject this context into the model. The agent with this model will maybe be effective, maybe not, and I will not have a good success rate and sometimes I will have to roll back, but it would be very hard to roll back a whole product or a whole feature.
[00:08:53] Simon: And what's interesting there is you're nowhere near the context window. You're nowhere near the maximum context size. You're just hitting that mark whereby the results based on the amount of context you provide degrade massively from that.
[00:09:10] Alex: Yes. These agents have a feature called compaction.
[00:09:15] Alex: This basically tells the agent to make a summary of all of the previous conversation and start with the minimum context from that summary. But the problem is that the instructions that are contained in this summary are half of the ones that you instructed him initially, and you have to most likely start from scratch again because you don't know what he misses.
[00:09:40] Simon: Okay, so what's next in the spec journey?
[00:09:44] Alex: What was the immediate next step? The immediate next step was to split the Big Mac down file into smaller tasks. So I would have, similar to how we have in Jira or in Linear or other project management tools, single tasks that just define what has to be built.
[00:10:01] Alex: And I basically translated this into markdown files because markdown is a universal language that is plain text with formatting, that both humans and agents can understand very well. And agents were actually more effective and I could easily roll back single tasks at the time.
[00:10:21] Simon: And you did this manually, or did you…?
[00:10:23] Alex: Yes. At the very beginning I was creating manually. When I reached around 50 manually created tasks, I was like any power software engineer would try to automate this. How can I automate this manual creation of tasks? So I built Backlog.md for enabling this and have an easy way to create tasks by using your terminal.
[00:10:47] Simon: The best way to learn about it is to see it, right?
[00:10:51] Simon: Yeah. Should we take a look at a quick demo of Backlog?
[00:10:53] Alex: Sure.
Simon: Talk us, talk us through.
Alex: So the first thing that you will do to even run Backlog, you would have to install it. So you can use ban, NPM or brew, and you should install it globally on your computer.
Simon: Why globally?
[00:11:11] Alex: Because then you don't have to install it in every single project that you're using, and you can immediately use it in every folder. This is going to take a few seconds, and even on conference Wi-Fi, it works. This is because one is very fast in installing dependencies.
[00:11:30] Alex: And afterwards, you can run Backlog as, as this no other instructions. This is going to give you some initial hints, what you can do with Backlog. So you can see here, you can create tasks, you can list the tasks, you can see the board. I will show, show it very quickly. You can also run a browser.
[00:11:53] Alex: And again, I will go in detail about this and then overview that shows what the statistics about your project, how many tasks are finished and how many are remaining. And the link to the docs and. The next thing to do. Normally you would want to create the tasks, but because this is, I'm running Backlog inside the Backlog project and I'm using Backlog to keep track of the tasks for Backlog itself.
[00:12:16] Alex: I already have hundreds of tasks. So let me show you how it looks. The bot, so Backlog gives you a board that is configurable with how many statuses you want. So the default ones are To Do, In Progress, and Done. And here you can see what is going on in this project. And all the tasks that you see here are in sync between multiple Git branches.
[00:12:39] Alex: So Backlog uses natively Git to fetch information from other branches. So if you work with other people, as long as they push the changes on a task. So, for example, I take over task 200 and I assign it to myself and I put it in progress. As I make a push on my feature branch, you will see it also in your Backlog instance on the main branch, for example.
[00:13:03] Alex: So this is also nice to use it as a collaboration tool.
[00:13:07] Simon: And can you filter based on branches and things like that here?
[00:13:10] Alex: So, the logic behind it is it's a bit hidden from the developers. Yep. Backlog should be smart enough to find what is the task with the latest updated date.
[00:13:22] Alex: And that means it's the newest state. Obviously, if you change the task by mistake, this will trigger that the task has the latest update version. But in normal use cases, this should not happen. Gotcha. Okay. And what you can do from here, you can find the details of the tasks.
[00:13:41] Alex: So for example, let's see, one task that I would like to build next. This one, I press enter and I immediately see the details of the tasks. So what I would like to have in the near future is drag and drop in the terminals. [00:14:00]
[00:14:00] Simon: Yeah.
[00:14:00] Alex: So like you do in Jira, you drag your task from To Do to In Progress. I would like to achieve the same in this board.
[00:14:08] Alex: So this is the task for that feature. It's not implemented yet. It's in To Do. And what we can see here are the task id, the title, some metadata about the task, some dependencies, because Backlog also supports dependencies. That basically prevents the agent to work on tasks which are not ready to be started.
[00:14:30] Alex: So this is also important in a spec driven development. We have a description that basically tells us why we are even doing this feature. So adding the drag and drop functionality using shift plus arrow will allow the users to not leave this terminal interface and to change the progress very easily.
[00:14:52] Alex: So this is very nice from a user perspective. And we have some acceptance criteria, which are the core of Backlog basically. Yeah, the acceptance criteria is something that can be testable, something that can be easily verified and measurable, and basically represents the increments of implement like this.
[00:15:14] Alex: Smaller increments within this single task. And whenever an acceptance criteria is done, it'll be ticked. So I will show you very soon how that looks. Let's open a task that has been completed. This one, for example, so you can see the status is Done. Yep. It has been taken over by OpenAI Codex.
[00:15:37] Alex: Also another feature about Backlog is labels. So you can label your tasks and filter later by labels. And here we can see the acceptance criteria are all obviously implemented. Yep. We have here an implementation plan. I will talk about the implementation plan a bit later, but it's part of the development.
[00:16:02] Alex: You always want to ask the agent how he would like to develop this task, and then you review. And then at the end we have the implementation notes, which are sort of permanent context of what has been done in this task. So if for any reason a human or an agent would like to see what happened in this task.
[00:16:22] Alex: They would read implementation notes and have this permanent context about it. Yep. So yeah, this is the primary interface. We also have a task list view, which is this one here. We have just a long backlog of tasks. We can also check the details and if it's longer scroll, we can also search for tasks or maybe let's find tasks about Tailwind.
[00:16:57] Alex: Yeah, this one, there is no task actually about Tailwind, but anyways, we can also filter by status. So To Do, Done, In Progress, priority again. So this is also like a second command panel where you manage your tasks. Yep. And that's basically it for the UI. We also have a web interface.
[00:17:23] Alex: For people that like to have more visual GUI for interacting with tasks, we also have a plain mode for AI agents. Let me show you that one because I think it's more important. So for example, we were looking at task 200. So Backlog task 200 --plain. This will show the same information but in plain text.
[00:17:52] Alex: And this is what agents should use.
[00:17:54] Simon: Yeah. Yeah. Yeah. I see. Yeah. Okay. And so, let's talk a little bit about, I want to kind of talk a little bit about the way in which, so you mentioned you built this a lot through specifications and things like that. How did you go about building those specifications?
[00:18:10] Simon: Did you use the tool, did you handcraft the specifications yourself? How did you deliver that to them?
[00:18:17] Alex: That's quite interesting because the first tasks I created manually. So this kind of format, I came up manually by creating them myself. And after four or five tasks, I already had Backlog CLI command to create tasks.
[00:18:32] Alex: So afterwards I just recursively started using Backlog to manage tasks about Backlog.
[00:18:38] Simon: AAnd so Backlog creates the specification about the task, which is essentially the thing, or not necessarily the specification of the task, but the information in that task that it can be used as a spec for an agent to implement that change. Would you say…
[00:18:51] Alex: So. Yes and no. So Backlog itself doesn't create specifications. Yeah. So Backlog is a tool that AI agents use. But it's not calling agents, it's not connected to AI agents by default. So you'd have to start the flow this way. So I start Cloud or Codex. Or I mean ICLI.
[00:19:12] Alex: And they tell these agents, "Hey, I want to build this feature, use Backlog and DCLI to keep track of the tasks and to split it into multiple sub-tasks." There are some agent instructions that come with Backlog that tell the agent how to use Backlog itself. So afterwards the agents will know how to split a bigger task that you have in mind into smaller tasks that feed Backlog.
[00:19:37] Alex: But Backlog itself, it should be as minimal as possible. Right. And should not be in the way. It should be sort of a tool that both the human and the agent use on the side.
[00:19:47] Simon: Gotcha, gotcha. So it's Claude that kind of determines the split almost and then adds that into Backlog.
[00:19:53] Simon: Exactly. Yeah. I see. Cool. So let's talk a little bit more about that in terms of how you would use Backlog with AI tools. When you say "use Backlog," is it using it as a CLI and using Bash? Is it using it as an MCP server?
[00:20:13] Alex: So for the moment we only have CLI. Okay, so the agent instructions that ship with Backlog, let me show you how you would configure this.
[00:20:23] Alex: When you initialize Backlog on a project, you will run Backlog in it. You'll put the project name and here. The most important part of using Backlog is to choose the agent instructions for the agent that you're using. So if you're using Claude, you will use Claude MD, or now AGENTS.md is becoming a standard.
[00:20:42] Alex: So once you select one or many agents, Backlog will create this file or append to an existing file if you already have it. All the instructions, how to use Backlog. So it'll tell Claude when the human asks you for creating a task, you should run Backlog task create. When the human asks you to edit a task, you should run Backlog task edit.
[00:21:04] Simon: Awesome. And so adds that into the agent's MD or the Cloud. Right. Automatically knows when starting this is how it does it, this is what it should do. Awesome. And I think you mentioned before there was an MCP server on its way.
[00:21:19] Alex: Yes. So, I didn't start with MCP. Connect at the very beginning because when I started with Backlog in June, the support was still not there for a lot of agents. Yep. But recently MCP is getting more and more momentum and they want to also add MCP support to Backlog. So in the next days we're probably gonna see it live.
[00:21:39] Simon: Amazing. Amazing. And given that we're probably gonna be releasing this next week.
[00:21:42] Alex: Yeah.
[00:21:42] Simon: Could even be there right now and say, do check that out on the backlog. Exactly. On the backlog tool. So, yeah, there's Claude and other agents can use Backlog to break down tasks and add them into the backlog, but also, of course, it's quite proficient as a coding agent.
[00:22:01] Simon: It can go ahead and take tasks off. And I think one of the tasks that you did see then was potentially even worked on and completed by Claude, so it could effectively do some work, update the tickets, people request.
[00:22:16] Alex: Yeah. Maybe something very important that it, I didn't mention is that Backlog
[00:22:19] Alex: Code was written 99% by AI. Yeah, because this was my initial challenge. Yeah. And the only info files that I wrote manually were the agent instructions. So how to use Backlog, obviously. But afterwards, every single line of code was written by AI. So Backlog code is really, really AI spec driven, AI developed.
[00:22:41] Simon: Amazing. Amazing. Cool. Let's talk a little bit about a key piece here, which is content. Obviously, there's the right amount of context that we can provide agents. You provide too little, it won't have enough to do a job effectively. Yeah. You can provide too much as you mentioned, and it degrades.
[00:23:01] Simon: There's that Goldilocks zone of context. As we were talking off there, a lot of context engineering has gone into Backlog MD. Yeah. Talk us through some of the areas that you felt pain and some of the key decisions that you made.
[00:23:16] Alex: Yeah, so let me start with a comparison with how humans work because I think there is also not very straightforward.
[00:23:22] Alex: So at work, me and other developers struggle sometimes to break down a bigger feature into smaller tasks. And what happens for humans when you don't break it down correctly, then the scope explodes. Basically, you have delays, then you rush, and then you have a lot of technical issues to solve later.
[00:23:46] Alex: So for humans, this is also a big issue. Yeah. But for agents, it's even more of a problem because the agents are sort of like humans that you hired five seconds ago, and you have to give them enough information for them to immediately start working on a task. Which is something that you will normally not expect from any human without proper onboarding.
[00:24:07] Alex: And this is basically what software engineers in the upcoming months will have to learn more and more when dealing with AI. So how can I give as much information as possible about the software that I'm building and the task itself so that the agent will work as a human teammate.
[00:24:30] Simon: And so how did you, what was the balance that you struck in terms of too much or too little?
[00:24:35] Alex: So I was struggling a lot to tell AI agents how big a task should be. And then I had a sort of a magic, how can I say it? It was unexpected. That I told Claude, "Hey, Cloud, break down this feature into smaller tasks that each of them can fit in a good PR."
[00:25:01] Alex: Okay. And while for me, a good PR is very subjective, Claude immediately started splitting them into the most atomic tasks that can implement the requirements without blowing the scope.
[00:25:16] Simon: That's really interesting. When we think about breaking tasks down, what we might consider is breaking it down into bite-sized chunks that we, as people, would think naturally fits together.
[00:25:30] Simon: But what you're saying there is Claude breaks it down. Perhaps it does it similarly, but it maybe more intentionally does it, whereby these are consumable pieces of change that as an agent, it can actually capably deliver.
[00:25:47] Alex: Yeah. I think this is because the training data that they have been trained on, so probably they've been trained on a lot of repositories and a lot of PRs, especially from GitHub.
[00:25:57] Alex: And there they could see good examples of how much it can fit in a good PR that doesn't have a lot of comments and is as quick as possible without issues.
[00:26:09] Simon: So if I was to say, look, I've got this feature. I would go over to Claude and say, break this feature down and add a bunch of stuff into Backlog.
[00:26:17] Simon: And it would then break it down, add it to Backlog, and then you would say, right. Can you go ahead and start working on these ones? What would the next step be beyond, well,
[00:26:27] Alex: Obviously you would have to review this stuff. Okay. The human in the loop is essential at this point in time.
[00:26:34] Alex: And actually the human in the loop is important right before starting the task. So, in my experience, the coding part is always done well if the task and the specs are well described. Yes. So what I'm putting a lot of effort into, and I'm trying to not become lazy, is to really review every single task.
[00:26:59] Alex: And then I will ask the agent to come with an implementation plan because this is also what humans do and what some humans do. And what I like to have in my team, if we want to build a feature together, we will create an implementation plan and agree on it, or maybe review it, and build this architecture at least on the paper together.
[00:27:21] Alex: And then someone will implement it. And this is how I effectively work at a team.
[00:27:27] Simon: How automatable is that process? Can you do, have you tried using LLMs as a judge or using different models to critique almost the breakdown of various tasks?
[00:27:41] Alex: I think this is changing a lot. So between when I started working on Backlog and today, few things have changed. So before I would really use one agent for creating the tasks, one agent for creating the implementation plan, and another agent for the execution. So three sessions with different context windows so that they are not biased between each other.
[00:28:04] Alex: Right now, models are becoming better at following a lot of instructions. So at least the task creation and the implementation plan could go together, or the implementation plan and execution could go together. But I don't use specialized agents as of today anymore because I feel we, with the latest models that are available today, the results are good enough.
[00:28:29] Simon: Yeah. Sounds good. Let's talk a little bit about what could be done with AI, what couldn't be done with AI. Obviously, the whole thing was built with AI under your guidance, so
[00:28:39] Alex: Right.
[00:28:41] Alex: So disclaimer, Backlog doesn't have a production database.
[00:28:45] Alex: t's not hosted online. It doesn't have any login or user information. I was—the data
[00:28:53] Simon: Because the data exists in Git repos, right?
[00:28:55] Alex: Exactly. All of the tasks are stored in your Git repository and you can use Git security to enhance the security. So I don't have to deal with this part.
[00:29:05] Alex: You're
[00:29:05] Simon: You're just using the whole of GitHub as your database. That's what you're doing?
Alex: Yeah, a GitHub or any other Git provider.
[00:29:12] Alex: eah. Okay. So they implement the security for me. Yes. So I didn't have to face serious production issues as I would do in my normal job. Yeah. But I think this is a big limitation right now.
[00:29:24] Alex: And the next steps would be how we can effectively do security reviews, performance reviews, and this kind of guardrails for a production application.
[00:29:40] Simon: Yeah. And from the point of view of AI, using Backlog MD, I guess how much would you say AI empowers the usage of Backlog MD? Because it sounds like using something like Claude really unlocks the full features of that autonomy, breaking tasks down, throwing it into Backlog MD, and the CLI-based usage fits so well with the terminal UIs of these agents.
[00:30:09] Alex: Yeah. I usually just have two tabs: one for the execution of the task, task creation, and one to look at what is going on. And I don't have to leave the terminal. And this can also run in a sandbox machine without any GUI or any browser. We can just SSH into a virtual environment and there we can run Backlog and Claude.
[00:30:35] Simon: Yeah. Would you find Backlog MD useful if it wasn't for AI agents like Claude, in terms of using the UI?
[00:30:45] Alex: So I heard lots of people using it successfully for non-AI tasks. So they create a task manually from the web interface. They manage them. They basically have a Trello, but that just works with the Git repository, right? And Backlog and DS CLI installed on their computer. That's it. And they don't have to think about hosting. They don't have to think about creating accounts or anything. So it's very simple. There are some use cases where Backlog can actually be used outside of AI.
[00:31:18] Alex: Yeah. But I think the stronger points are when you need to really break down a big feature within smaller context windows.
[00:31:31] Simon: We, here on the podcast and at Tessl, and hearing a lot about it here at the conference as well, we love to use the word AI-native. Similar to Cloud Native, it's really about rethinking how we would go ahead and change our processes, change our tools, change our way of working using a technology as a first-class citizen.
[00:31:51] Simon: So using a technology natively in your organization, in your ways of working. So when we look at some of the, if we were to redesign how we develop software, like our basic processes and things like that, how does an Agile workflow change with these new AI development processes and tools?
[00:32:18] Alex: So I think Agile frameworks, the ones that are built today, are built mostly for humans. Yes. And they don't fit very well for AI agents or the whole AI-native world. But I think Agile itself is a very, very important way of working. If you look at the core principles of Agile, they really allow you to build solid software.
[00:32:43] Alex: Maybe that you don't know exactly how you will end up building at the end, because that's the whole point of Agile, is to steer it to review, to have some stakeholder communication.
[00:32:56] Alex: And those core principles are there to stay. We need to empower these core principles and connect them with AI.
[00:33:05] Alex: What I see not being so important in the future is the bandwidth calculation that comes from certain Scrum processes, because with AI we have virtually infinite bandwidth. And this is a big change that companies will start to look into and try to find a solution for, because if now we have sprints and we try to understand how much we can fit in a two-week sprint, in the future you can have AI agents working the whole night and can burn through five sprints until the morning, given that they have this automatic review step. So I'm not talking as of today, but maybe in a few months from now, this is going to be a challenge that we need to solve.
[00:33:48] Simon: So what is that bottleneck now then that's stopping that? Is that, like you say, the fact that we're doing—
[00:33:53] Alex: Yeah.
[00:33:53] Simon: In that world it's automated code reviews. Today we don't have that. Is it still that human interaction that is the bottleneck?
[00:34:00] Alex: Currently, as of today, this is the biggest bottleneck.
[00:34:03] Alex: Yes. Currently, as of today, this is the biggest bottleneck. So a human has to tell that a certain task is done successfully and it really implements the specs, even though if the specs are correct, using, in my case, using Backlog, the implementation is also most of the time going to be successful. I still want to review at least one time. And I don't see right now a moment where you can reliably have an automated review process and automated merging into your main branch, and then you can say, okay, now we can deploy this to production.
[00:34:34] Alex: I would myself not feel confident enough as such thing, but assuming that the review steps, security, performance, quality, all of these things will be automated, then the next big bottleneck will be conflict. And by conflicts, I mean not just like merge conflicts, but really building a product requires you to start building the fundamentals.
[00:35:01] Alex: Then you build layers on top of layers, and then you have the final product at the end. But each layer is dependent on previous layers. And we have to bake this into the AI flow where the agents can first of all not clash into each other. And second, they know they have to wait for certain things to be ready in order for them to start.
[00:35:20] Alex: And once you automate also this part, then you really have a product that you, once you give all the idea and you verify that idea is what you expect. You let them run for hours, days, maybe even weeks, fully automated.
[00:35:36] Simon: And what does a, it's interesting. What does a retro look like? Is it, do we do a retro with agents? Do we have an agent retro, or do we have humans trying to give feedback?
[00:35:47] Alex: We kind of have the agent retros right now, they're called agents MD or Cloud MD. I think those are the closest thing to human retros. So basically you finish working on a task using backlog, any other tool.
[00:36:02] Alex: Then you realize you stumble across some patterns, some negative patterns that you want to avoid in the future. So what you normally should do, go to your and instructions say. Please don't do it like this or do it like that and try to prevent these issues from happening again. So the retro in the sense of looking back and trying to do more of what went good and do less of what went bad exists more or less with these agent instructions.
[00:36:26] Alex: But if we have to think about more structured process where humans and AI agents are looking together, what went wrong in a certain product. But I think there's still room to implement. And also what is probably missing right now is let the agents see the final product.
[00:36:50] Alex: Yes. And this is where the retrospective is really effective. So we, together as a team, we need to build a big product and this is how it looks and this is how users are using it. And we did a good job or a bad job implementing these features. This is missing right now.
[00:37:08] Simon: And, and, and that's key though.
[00:37:08] Simon: The thing that you said there about users giving their feedback after something's happened, sprints are nice because maybe you do some work for a week or a couple of weeks, right. And you give it to someone, you get some feedback. And actually in the time you're getting that feedback, it's not like you can accelerate and do three, four times, four x the amount of work and keep adding.
[00:37:30] Simon: So if we can develop and deliver so fast. Is there a risk? And what is that risk of us being able to deliver faster than people can consume it and people can give good feedback? Is that a risk or is that actually something that's very easily fixed? Because well, building is cheap now.
[00:37:51] Simon: If we make a mistake, we roll back quite quickly and you continue to build.
[00:37:54] Alex: Yeah, I totally agree with you on this. I heard the story at one of Myna meetups from a guy that is a plumber and he doesn't have coding experience. And he vibe coded a small application to calculate the measurements for the tools that he has to use and how to be a good plumber.
[00:38:14] Alex: And he said he got a quote for a few thousand euros from a software developer for a thing that he built himself in a couple of days by vibe coding.
[00:38:24] Simon: We're in trouble. 'Cause if plumbers can vibe code, I can't, I can't do plumbing. We're in trouble, Alex.
[00:38:32] Alex: Yeah. So I think what we're gonna witness in the next month is also sort of a single-use software. Maybe like a use one disposable software, disposable software, yeah. Everyone can just implement small and medium-sized applications and they can use them once and maybe they're not necessary afterwards, so they don't have to even have a production quality.
[00:38:58] Simon: So it's almost like prototyping [00:39:00] change, right?
[00:39:00] Simon: Being able to say, hey, here's closer to the main vision. Play with it. Let me know what you like and dislike. And of course, I can just spin up parts of those changes in my new version, maybe change it a little bit more. So it's maybe more about using that speed of development to create.
[00:39:18] Simon: More fully fledged prototypes that we can then actually test on folks rather than—
Alex: Yes, exactly.
Simon: Very Interesting.
[00:39:25] Alex: One, one very important thing that I noticed when we working with AI is that even if you give enough context for any agent to work on a single task, so you put all the information that is necessary to start, that at least will be necessary for a human, they still have no idea about your project.
[00:39:44] Alex: Okay. Like effectively they start reading the files. Yeah. That they have to modify when they start working on the task. Yeah. And by the time they are done, that's the moment where they really know almost everything about how the task should be done, but the task is already finished at this point. [00:40:00] Yeah. So maybe what we are gonna see from now on is I do a quick prototype, a quick run, let the agent ingest all the context, not just from me, but also from the code itself automatically.
[00:40:12] Alex: Build it. Throw it away. And if we still have context window, we start from scratching a clean architect to a clean, like domain driven design really nicely done. And maybe this result is gonna be better on the second try.
[00:40:29] Simon: Right, right. Interesting. Let's trigger people. Alex, you ready? Are you ready to trigger the audience?
[00:40:34] Simon: Yes. Let's trigger the audience. Does Scrum even have a role in this future?
[00:40:42] Alex: So, this is a tough question. I know a lot of people are putting a lot of effort into becoming good scrum masters, good agile coaches. And while this is very important when working with humans, we really need to think at refactoring the processes the [00:41:00] way, same way you refactor code.
[00:41:01] Alex: So when you refactor code, you don't throw away everything. Look at what's working and try to improve it and to streamline it and simplify it. And this is what we need to do also with Scrum or all, all these agile processes that are built for humans to work in a certain way. But now suddenly we have infinite bandwidth.
[00:41:23] Alex: We have different processes, like we said, like different deliverables, checkpoints where the humans have to review this and it has to be much, much faster than waiting two weeks to have the. The scrum review where we present the, the output to the stakeholders. We might need two, three reviews per day in the future.
[00:41:43] Simon: So what would you say are like the key AI scrum measurements or, or flags? We need to kind of like raise at various points. What's the equivalent?
[00:42:11] Alex: So what are the core fundamental features of Scrum? It allows you to have certain deliverables in a certain amount of time and to coordinate between teams and stakeholders. This deliverable so that you can verify if you should proceed or you have to change things. And maybe what we can learn from from Scrum is this, like create ai checkpoints where sort of all of the ais that worked in parallel the whole night before they reached the same point.
[00:42:50] Alex: Maybe they handshake between each other and they say, okay, now we need the human in the loop. And afterwards we can proceed with another batch [00:43:00] of parallel agents, but not before the agent has validated a human.
[00:43:04] Simon: So almost, it's almost an AI Scrum that then introduces a human at the right time.
Alex: Yes. Yeah.
[00:43:08] Simon: Very interesting. Let's talk a little bit about the futures. Let's get crystal ball and kind of say, okay, six months time, 12 months time, maybe even longer. What do you expect to see, I guess, both in and around agent. Or specs, what are the major changes that you foresee in software development, in and around those two areas?
[00:43:29] Alex: So we have to look a little bit back to understand where the curve is coming from. Yeah. So, Claude Code, was released somewhere in March, April, which is insane in its insane. And this is like just a few months ago, and now we already have few iterations of it and can run in parallel, can use nested agents.
[00:43:48] Alex: It's so powerful. And if I think that in six months it's gonna be even more powerful, given that we're on an exponential curve, it's very hard to predict where we're gonna go. But the, what are the [00:44:00] fundamentals? The fundamentals are AI agents that are becoming more and more a hub, a central hub that can decide what to do based on the human interaction.
[00:44:11] Alex: So, I expect them to run completely 24 by seven. Yep. So I don't expect you to have to start Claude give him a task. Finish the task. Close him and start, start a new task in another session.
[00:44:40] Alex: I expect to have a persistent agent that runs 24 by seven. That is your central hub. This agent should be your input point. Should keep a context window that has an understanding of what you need to achieve as a human, but then it'll use [00:45:00] subagents to actually solve the tasks and they should keep his own context window as small as possible.
[00:45:07] Alex: So that can coordinate sort of a team, lead of, of agents basically, that can spawn agents when needed. And this agent can also be proactive. And we saw this recently with some chat GPT features where you get proactive information every morning that are relevant for you. And I think agents are, can do the same but better because agent can use tool, can call tools, can connect to other sources of information and can maybe get trained on your own data.
[00:45:40] Alex: And then they can actually actively work on things that you can delegate to him or he can come to you and say, Hey, you should prepare for the birthday of your wife that's coming next month. I think it's a good moment to start looking at presence or organizing a party.
[00:45:56] Simon: Yeah. When you said that, my heart, my heart beat. Oh my God. Is, [00:46:00] is it next month for my wife? No, it's, it's fine. It's, it's next year. We're good. But yeah. Super interesting. Super interesting. Alex, we could go on for many more sessions actually. And the booth is, the booth and the, and the expert area is getting very, very busy. So I think we should, we should wrap up there, but super interesting topic where, where can, where can people go Backlog.md for, for learning more and, and, and getting started on backlog.
[00:46:26] Simon: Right.
[00:46:27] Alex: So I was very lucky that Backlog md the Moldova domain was excellent. So now when I talk about Black Backlog MD, I just tell people, just type in your browser Backlog.md , and it'll immediately show my GitHub repository. And they can learn how to use it.
[00:46:44] Simon: Yeah. Amazing. So check out backlog.
[00:46:46] Simon: And Alex, absolute pleasure. Let's go and enjoy the rest of the conference. But for now, thanks very much for listening. Thanks for tuning in and see you next time.
Chapters
In this episode
In this live episode from Devoxx Belgium, AI Native Dev host Simon Maple chats with Alex Gavrilescu, creator of Backlog.md, about transforming ad-hoc "vibe coding" into a structured engineering practice with AI. They explore how spec-driven development and atomic Markdown tasks can make AI coding effective, safe, and scalable, offering a practical blueprint for AI-native teams to enhance workflow consistency and reliability.
Recorded live at Devoxx Belgium, this episode of AI Native Dev brings a candid, practical look at turning “vibe coding” with LLMs into a disciplined engineering practice. Host Simon Maple sits down with Vienna-based lead engineer Alex Gavrilescu—creator of Backlog.md—to unpack how spec-driven development makes AI coding effective, safe, and scalable. From repeated prompt fatigue and lost context to a CLI-first workflow that bakes in acceptance criteria and dependencies, Alex shares a blueprint any AI-native team can adopt.
From Vibes to Velocity: Why Ad-hoc Prompting Breaks at Scale
Alex started where many developers do—throwing prompts at capable models like Claude and Claude MD and letting them code. The early results were tantalizing: the agent could often reach the feature goal. But each task required heavy back-and-forth, and the same instructions had to be re-typed in every session. As soon as a chat closed, the agent forgot crucial constraints, leading to recurring mistakes and inconsistent output across tasks.
He also hit the limits of LLM conversation mechanics. Even when he wasn’t close to max context, quality degraded as he injected larger specs. Some agents “compact” conversation history into summaries, but critical instructions often get dropped. The upshot: ad-hoc prompting scales poorly because it lacks persistent context, repeatable guardrails, and a way to enforce consistency across tasks.
Just as importantly, vibe coding ignores the operational guardrails engineering teams rely on. Security practices, CI/CD constraints, staging environments, and language/framework standards all got lost in the shuffle. Alex’s insight was to import the rigor of human processes—PRDs, Scrum discipline, acceptance criteria—into the AI collaboration model. With the right specs in the right shape, agents become much more reliable teammates.
Atomic Markdown Tasks: The Spec Format That LLMs (and Teams) Can Execute
Alex’s first attempt at structure was a giant Markdown document that captured everything: feature specs, security requirements, CI/CD expectations, language decisions (C#, TypeScript, Java), and more. It was comprehensive—but not usable. Large monolithic context proved brittle. Summarization/compaction dropped key rules, rollback was painful, and the agent’s performance was inconsistent.
The breakthrough was to split the monolith into atomic Markdown tasks, each mirroring a Jira/Linear ticket: a clear title, a short description that explains the “why,” and acceptance criteria that are testable and measurable. Dependencies between tasks ensure the agent doesn’t start work until prerequisites are met. This format provides a minimum viable context for the agent to do high-quality work while keeping specs legible for humans.
Two more ideas make the loop robust. First, an “Implementation Plan” drafted by the agent before coding creates an explicit, reviewable approach. This forces alignment and surfaces risks early. Second, “Implementation Notes” capture what actually happened—permanent context the team and future agents can rely on. Together, these elements create a feedback-safe, auditable trail. If a change needs rolling back, you revert a single task and its notes—no more all-or-nothing spec reversions.
Backlog.md: CLI-First, Git-Native, and Agent-Friendly by Design
To streamline this workflow, Alex built Backlog.md, a developer-first backlog you manage entirely from your terminal. Install it globally via bun, npm, or Homebrew, and it’s instantly available in any repo. The CLI guides you with a command palette: create tasks, list them, open a Kanban board, launch a web interface, or get an overview of progress.
The board is configurable (default: To Do, In Progress, Done). Hit enter to drill into a task and you’ll see the full spec: ID, title, metadata, dependencies, description, acceptance criteria, and sections for Implementation Plan and Implementation Notes. You can label tasks (e.g., “security,” “CICD,” “frontend”), filter by status or priority, and manage everything without leaving the terminal. A web UI is available for those who prefer a visual interface; terminal drag-and-drop (think Shift+Arrow to move tasks between columns) is on the roadmap.
Backlog.md is Git-native. Tasks live in your repository and sync across branches. If you pick up Task #200, assign it to yourself, and set it to In Progress on your feature branch, teammates see that update on the main branch as soon as you push. The tool reconciles task state based on last-updated timestamps, keeping the “source of truth” simple and distributed. Crucially, Backlog.md also offers a plain mode—backlog task
The result is a tight loop: specs and code live side by side, status changes are versioned, and agents consume the same canonical task text that humans review. Alex even demoed tasks completed by an agent, with acceptance criteria automatically ticked and notes captured—proof that the model and the workflow can meet in the middle.
A Practical AI Dev Loop: Plans, Guardrails, and Easy Rollbacks
What emerges is a repeatable, low-friction workflow any AI-native team can adopt:
- Start with a small, atomic task. Define the why (description), the what (acceptance criteria), and the constraints (language choice, security posture, CI/CD requirements, staging gates). Add labels and set dependencies so the agent can’t start prematurely.
- Ask the agent for an Implementation Plan before coding. Review and refine it. This is where you catch risky changes to auth flows, performance assumptions, or schema migrations.
- Let the agent implement against the acceptance criteria. Because criteria are testable and measurable, they double as your validation checklist and can hook into automated tests or smoke checks in CI.
- Commit task updates and push to a feature branch. The board reflects real-time state. Use staging to validate security and operational constraints—guardrails that vibe coding often forgets.
- Capture Implementation Notes. This permanent context prevents the “Groundhog Day” effect where instructions are repeatedly reintroduced. If something goes wrong, roll back at the task level without losing unrelated work.
Alex cautions against “kitchen sink context.” Feeding everything to the model reduces reliability; summarization and compaction can silently drop the very rules you care about. Instead, give the agent just enough well-structured context per task. Combine that with a plan-review step, explicit dependencies, and notes, and your AI code contributions become both predictable and auditable.
Looking ahead, Backlog.md will continue to refine the UX (e.g., terminal drag-and-drop) while preserving its agent-friendly primitives. The philosophy remains the same: keep specs small, precise, and close to the code—and make it trivial for both humans and LLMs to execute them.
Key Takeaways
- Don’t rely on vibe coding. Ad-hoc prompting leads to repeated mistakes, lost constraints, and risky changes landing in production.
- Use atomic Markdown tasks with acceptance criteria. Keep specs small, testable, and measurable; avoid monolithic context dumps.
- Capture dependencies and guardrails. Block tasks until prerequisites are met and include security and CI/CD requirements in the spec.
- Add an Implementation Plan step. Have the agent propose a plan, review it, and only then proceed—this prevents avoidable rework.
- Keep permanent context with Implementation Notes. Persist what changed and why, so future agents and teammates don’t repeat past errors.
- Make specs agent-friendly. Use Backlog.md’s --plain output when feeding tasks to LLMs to avoid formatting ambiguity.
- Keep tasks and state in Git. Backlog.md syncs across branches, making status updates and rollbacks versioned and collaborative.
- Install once, use anywhere. Backlog.md via bun/npm/Homebrew gives a fast, CLI-first workflow with an optional web UI for visibility.
- Optimize context, don’t maximize it. Big summaries and compaction can drop critical rules; minimal viable context per task is more reliable.
- Treat AI like a teammate in your process. Bring your team’s agile discipline—PRDs, staging, reviews—into your AI development loop.
Related episodes

MCP: The USB-C
For AI
Redefining Developer Workflows in the AI Era with MCP
7 Oct 2025
with Steve Manuel

Why 95%
of Agents
Fail
Can Agentic Engineering Really Deliver Enterprise-Grade Code?
23 Sept 2025
with Reuven Cohen

AI TO HELP
DEVS 10X?
Why AI Coding Agents Are Here To Stay
17 Jul 2025
with Patrick Debois