Attackers can turn AI agent guardrails into denial-of-service weapons, according to new research that found a single poisoned document can dramatically slow shared AI agent workflows by trapping reasoning-based safety systems in extended thinking loops.
“Reasoning-based guardrails introduce a new attack surface where security mechanisms themselves become the target,” the researchers from Hong Kong University of Science and Technology and collaborators wrote in the paper.
They added that “a single poisoned document can saturate shared guardrail infrastructures, effectively starving co-located agents and paralyzing the entire system,” describing a reasoning-extension denial-of-service (DoS) attack that targets the security layer rather than the underlying AI model.
The researchers tested the technique against four AI agent frameworks — LangGraph, BrowserGym, OpenHands, and OSWorld — and found processing times increased across deployments.
LangGraph recorded the biggest slowdown at 148x, followed by BrowserGym at 131x, OpenHands at 36.3x, and OSWorld at 18x, according to the paper.
Attack exploits reasoning rather than bypassing security
Unlike prompt injection and jailbreak attacks that seek to manipulate model outputs or circumvent safety controls, the new technique targets the reasoning process used by AI agent guardrails, the researchers wrote in the paper.
“Unlike traditional LLM attacks that primarily compromise integrity, reasoning-extension DoS targets availability,” the researchers wrote, arguing that AI security discussions have focused largely on preventing unsafe outputs while overlooking resource exhaustion.
The researchers also found that stronger AI safety checks may come at the cost of slower performance.
“The stronger the guardrail reasons, the longer it reasons,” the researchers wrote, explaining that more sophisticated reasoning can inadvertently increase the time and resources required to process malicious inputs.
The attack also worked across eight different LLM families. According to the paper, prompts designed for one open-source model were also effective against other models, suggesting attackers would not need detailed knowledge of a specific proprietary system
OpenAI and Anthropic, whose reasoning-based guardrails are referenced in the paper as examples of LLM-powered security mechanisms, did not immediately respond to requests for comment.
Shared AI governance creates concentration risk
“The more important takeaway is not necessarily whether a specific ‘guardrail DoS’ technique proves practical at scale, but that AI governance infrastructure is increasingly becoming critical infrastructure,” said Sakshi Grover, senior research manager for cybersecurity services at IDC Asia/Pacific.
“As agentic AI deployments mature, organizations will need to think about resilience, scalability, and fault tolerance for AI control planes in the same way they already do for identity services, API gateways, and other business-critical platforms,” she said.
Grover said centralized AI governance also introduces concentration risk.
“The consolidation dynamic is real — organizations are rationalizing AI governance by routing multiple agents through shared safety infrastructure, which creates concentration risk,” she said. “A successful guardrail DoS doesn’t need to breach anything; it just needs to make the system unusable at a critical moment.”
For business-critical workflows such as automated claims processing, AI-assisted incident response and real-time fraud detection, even temporary latency or resource exhaustion could have material consequences, she added.
Existing mitigations offer only partial protection
The researchers found conventional prompt injection filters remained susceptible to the proposed attack, while strict token limits simply shifted deployments between fail-open and fail-closed behavior. Smaller reasoning budgets reduced latency but also weakened security decisions, creating a tradeoff between availability and protection.
The study also found that larger reasoning models often spent more time following the injected reasoning structure, amplifying rather than mitigating the attack.
The findings also reinforce the need for enterprises to move beyond model-level security and focus on governance of autonomous AI systems, analysts said.
Through 2029, more than 50% of successful cybersecurity attacks against AI agents will exploit access control issues using direct or indirect prompt injection as an attack vector, while through 2028 at least 80% of unauthorized AI agent transactions will result from internal policy violations or misguided AI behavior rather than malicious attacks, said Apeksha Kaushik, senior principal analyst at Gartner.
“The transition to autonomous multiagent systems introduces new risks, such as behavioral drift and destructive actions,” Kaushik said, adding that organizations should implement AI agent security lifecycle management that continuously validates agent integrity from deployment through retirement.
Current fragmented tools cannot effectively govern complex multi-agent systems, she said, requiring unified discovery, identity, and guardian capabilities to monitor and block rogue behaviors at scale.
AI governance moves to the forefront
Grover said that organizations should begin preparing now by decoupling guardrail infrastructure from agent compute, implementing tiered or asynchronous guardrail checks where possible, monitoring for anomalous reasoning depth, and explicitly red-teaming AI safety stacks for availability failures rather than focusing exclusively on harmful outputs.
“Architecture choices are becoming as consequential as model safety choices,” Grover said. “The organizations that treat agentic AI infrastructure with the same rigor they apply to critical application infrastructure will be better positioned. The ones that don’t will find out the hard way.”