News

News

The Dark Side of “Just Hooking Up" AI Agents to GitHub

6 Jun 2025

Simon Maple

When AI Agents Go Rogue
When AI Agents Go Rogue
When AI Agents Go Rogue

How a Malicious GitHub Issue Can Hijack Your AI Dev Workflow

We posted about the GitHub MCP server a couple of times recently, highlighting the productivity value it can offer to a developer’s workflow. Invariant, an AI security company recently exposed a critical issue in GitHub’s MCP integration, which can be exploited by a single malicious GitHub issue created on a public GitHub repository.

Two critical things to mention. Firstly, there isn’t a currently known active attack in this space, and more worryingly, there isn’t an easy and complete fix. Secondly, while this mentions the GitHub MCP, there isn’t a patch that one can easily apply to fix this issue. This is a problem inherent to the way LLMs fail to distinguish between the control and data planes they are given and make poor decisions thereafter.


TL;DR

  • Agents with GitHub MCP (and similar integrations) are vulnerable to prompt injections from public inputs.

  • A malicious GitHub issue can hijack your agent to expose private repo data.

  • Model alignment and trusted APIs aren’t enough.

  • As we automate workflows, toxic agent flows will become a more mainstream concern for security and trust.

We are becoming increasingly comfortable connecting our IDEs, terminals, and agents with tools like GitHub, Slack, Docker, and AWS, utilizing MCP servers. While this is extremely valuable, we’re also unknowingly building up an attack surface. This is the dark side of AI-native development, some of which we recently covered talking with Danny Allan, CTO at Snyk, at the 2025 AI Native DevCon Spring event.


Am I vulnerable, and what can a successful attack do?

First of all, you’ll need a specific setup to be vulnerable, such as the following:

  • You’re building with agents. Maybe you’re using Claude Desktop or Cursor, and you’re using the GitHub MCP to fetch issues and create PRs.

  • You have a public repo accepting issues.

  • You have a private repo filled with data that the world cannot access.

With this setup, a successful exploit, LLM can share private data from the private github repo to a pull request that it creates on the public repo. This is a data leakage vulnerability that circumvents access control via the LLM, and exposes it in a place that the MCP server can access, in this case, a PR in the public Github repo.


How does the exploit work?

Let’s play through the scenario in which sensitive data could be leaked. You first need to have a configuration as mentioned above. Then one day, you prompt your agent from your client, something like the following:

“Can you summarize the latest issues in the public repo and fix them?”

Seems simple enough, and not unusual behavior. I imagine many of you have taken similar actions with good results and might instinctively defend your approach if questioned..I expect many of you reading this will have done similar and with good results, that you can reply to in a classicly defensive way. However, there’s an additional step needed that makes the exploit possible. Let’s say a GitHub user has created an issue on your public repository that contains a maliciously crafted prompt. Perhaps this is text directly in the comment, or perhaps it's in a code block.

Your agent loads the issue. It follows instructions from inside the issue itself. We’ve seen these classic prompt injection attacks before, but what’s interesting is that the exploits require the GitHub integration to pull private code into context. The LLM then skips the authorization engine, which would determine whether the user has access to this data, and it would, without user interaction, create a new PR with that content in the public repository.

No alarms. No model warnings. Just vibes, and a leak.

Invariant refer to these as “toxic agent flows”, and they’re fundamentally different from the kinds of tool-based vulnerabilities devs are used to defending against.


What You Can Do Now

  1. Validate your data - Treat public inputs from your GitHub repo as you would user data in your app. Any data there, even metadata, can be considered something that is used for prompt injection.

  2. Review tool calls - Where possible, validate manually for sensitive data or in a more automated way otherwise.

  3. Isolate workflows - Tools can help here, and the organisation that found the issue might also be able to help with their tooling to restrict cross-repo behavior.

To read Invariant’s post on their findings, the disclosure post is on their blog.

AIND Newsletter

AIND Newsletter

AIND Newsletter

Datadog CEO Olivier Pomel on AI Security, Trust, and the Future of Observability

Visit the podcasts page

Listen and watch
our podcasts

Datadog CEO Olivier Pomel on AI Security, Trust, and the Future of Observability

Visit the podcasts page

Datadog CEO Olivier Pomel on AI Security, Trust, and the Future of Observability

Visit the podcasts page

THE WEEKLY DIGEST

Subscribe

Sign up to be notified when we post.

Subscribe

THE WEEKLY DIGEST

Subscribe

Sign up to be notified when we post.

Subscribe

THE WEEKLY DIGEST

Subscribe

Sign up to be notified when we post.

Subscribe

JOIN US ON

Discord

Come and join the discussion.

Join

JOIN US ON

Discord

Come and join the discussion.

Join

JOIN US ON

Discord

Come and join the discussion.

Join