usatoday24

In some development teams it is already becoming common to rely on artificial intelligence agents to review incidents, analyze code changes and move through tasks that were previously left in human hands. The problem appears when these systems not only read information that may come from outside, but also operate in spaces where they coexist. sensitive keys, tokens and permissions. That is what recent research puts on the table: we are not simply facing a useful tool that can make mistakes, but rather an architecture that can also become dangerous if it is deployed without very clear limits.

The alarm has been turned on Aonan Guan and Johns Hopkins researchers Zhengyu Liu and Gavin Zhong after demonstrating attacks against three agents deployed on the aforementioned platform: Claude Code Security Review, from Anthropic, Gemini CLI Action, from Google, and GitHub Copilot Agent, a GitHub tool under Microsoft. According to your documentation, The failures were communicated in a coordinated manner and ended in financial rewards paid by the companies, but what is relevant is that they point to a broader problem.

This is how they managed to twist the agents from within

The name that Guan gives to the discovery helps a lot to understand what this is all about: “Comment and Control.” The idea is simple to explain, although the substance is not so simple. Instead of setting up an external infrastructure to direct the attack, GitHub itself acts as an entry and exit channel: the attacker leave the instruction in a titlean incident or a comment, the agent processes it as if it were part of normal work and the result ends up reappearing within that same environment. Everything stays at home, and that is precisely the key to the problem.

And that “everything stays at home” is not a minor detail, but the basis of what the research describes. The three agents share a very similar logic: they read normal content from GitHub, incorporate it as a work context, and from there, execute actions within automated flows. The clash appears because that same space not only contains text sent by third parties, but also tools, permissions and secrets that the agent needs to operate.

The first case Guan details concerns Claude Code Security Review, an Anthropic GitHub action designed to review code changes and look for possible security flaws. Up to this point, everything is within what was expected. The problem, as the researcher explains, is that it was enough to introduce malicious instructions in the title of a pull requestwhich is the request that someone sends to propose changes to a project, so that the agent will execute commands and return the result as if it were part of your review. The team then managed to go a step further and demonstrate that it could also extract credentials from the environment.

The interesting thing is that the same scheme also appeared in the other two services, although with nuances. At Google, Gemini CLI Action could be pushed to reveal the GEMINI_API_KEY from instructions snuck into an issue and its comments; In GitHub Copilot Agent, the variant was even more worrying, because the attack was hidden in an HTML comment that a person did not see on the screen, but the agent did process when another person assigned it to the case. In both scenarios, the background was the same again: apparently normal content that ended up twisting the behavior of the system until exposing credentials or sensitive information within GitHub itself.

Guan assures that the pattern made it possible to leak API keys, GitHub tokens and other secrets exposed in the environment where the agent ran, that is, just the credentials that can later open the door to much more delicate actions. Who does this affect? Especially to repositories that run agents in GitHub Actions on content sent by untrustworthy collaborators and, in addition, give them access to secrets or powerful tools. The researcher himself clarifies that the risk depends a lot on the configuration: by default GitHub does not expose secrets to pull requests from forksbut there are deployments that open that door.

And here another layer of the matter appears, less technical but just as important. As published by The RegisterAnthropic, Google, and GitHub ended up paying bounties for the findings, but none of the three had published public notices or assigned CVE at the time of that information. Guan was quite clear about this: he said he knew “for certain” that some users were still stuck on vulnerable versions and warned that, without visible communication, many may never know that they were exposed or even being attacked. So although there were mitigations and changes in documentation or in the internal treatment of reports, there was no equivalent public notice for all those potentially affected.

Anthropic settled the case on November 25, 2025 and paid $100
Google rewarded the discovery on January 20, 2026 with $1,337
GitHub closed the case on March 9, 2026 with a payment of $500

What makes this case especially delicate is that GitHub does not seem like the end of the road, but rather the first visible showcase. Guan argues that the same pattern can probably be reproduced in other agents who work with tools and secrets within automatic flows, and there he mentions from Slack-connected bots to Jira agentsmail or deployment automation. The logic is the same again: if the system has to read external content to do its job and also has enough access to act, the field is fertile for someone to try to twist it from within.

How often should we change ALL our passwords according to three cybersecurity experts

The conclusion that Guan reaches is not about selling a magic solution, but about returning to a fairly classic idea in security: giving each system only what is essential to do its job. If an agent reviews code, they shouldn’t have access to tools or secrets they don’t need; If you’re just summarizing issues, it wouldn’t make sense for you to write to GitHub or touch sensitive credentials. That is why it insists on thinking about these deployments with least privilege logic and very closed permission lists.

Images | DC Studio | Aonan Guan

In Xataka | AI is crucial for the US military. So he’s naming OpenAI and Palantir leaders as lieutenant generals

Leave your vote

0 Points

Upvote Downvote

They have kidnapped agents from Anthropic, Google and Microsoft for the sake of science. The three companies ended up paying

This is how they managed to twist the agents from within

Leave your vote

Leave a CommentCancel reply

This is how they managed to twist the agents from within

Leave your vote

Leave a CommentCancel reply

Log In

Sign In

Forgot password?

Your password reset link appears to be invalid or expired.

Log in

Privacy Policy

Add to Collection

No Collections