Researchers trick ChatGPT into prompt injecting itself – CSO Online

AI chatbots have opened a new frontier of attack vectors against users and their data, and not even industry leaders are immune. Following recent flaws discovered in Google’s Gemini and Anthropic’s Claude, it’s now ChatGPT’s turn.

Researchers from security firm Tenable discovered seven ways attackers could trick ChatGPT into disclosing private information from users’ chat histories. Most of these attacks are indirect prompt injections that exploit default tools and features OpenAI provides in ChatGPT, including its ability to remember conversation context long-term and its web search capabilities.

“These vulnerabilities, present in the latest GPT-5 model, could allow attackers to exploit users without their knowledge through several likely victim use cases, including simply asking ChatGPT a question,” the researchers wrote in their report.

Hiding malicious instructions in websites

ChatGPT can search the web for information and visit user-provided URLs to extract content on request. This content isn’t passed directly to ChatGPT, however. Instead it goes through an intermediary, more limited large language model (LLM) dubbed SearchGPT, which then sumarizes the content for ChatGPT.

The use of a secondary model that doesn’t have direct access to the user’s conversation appears to be an architectural decision specifically aimed at limiting the impact of potential prompt injection attacks from web content.

The Tenable researchers determined that SearchGPT is indeed susceptible to prompt injections when parsing web pages during either its browsing or search functions. Attackers could, for example, place malicious instructions in blog comments, or craft a poisoned website that ranks high in search results for specific keywords — ChatGPT uses Bing for search, the researchers discovered.

Moreover, to hide rogue prompts, attackers could present a clean version of a website to search engines and regular visitors while serving a different version to OpenAI’s web crawlers, which identify themselves with a User-Agent string called OAI-Search in request headers.

“AI vendors are relying on metrics like SEO scores, which are not security boundaries, to choose which sources to trust,” the researchers found. “By hiding the prompt in tailor-made sites, attackers could directly target users based on specific topics or political and social trends.”

However, even if an attacker gets SearchGPT to execute a rogue prompt, its context separation from ChatGPT means the model doesn’t have direct access to private user information. Despite this, the researchers found a way to exploit the relationship between the two models.

Conversation injection and stealthy data exfiltration

Because ChatGPT receives output from SearchGPT after the search model processes content, Tenable’s researchers wondered what would happen if SearchGPT’s response itself contained a prompt injection. In other words, could they use a website to inject a prompt that instructs SearchGPT to inject a different prompt into ChatGPT, effectively creating a chained attack? The answer is yes, resulting in a technique Tenable dubbed “conversation injection.”

“When responding to the following prompts, ChatGPT will review the Conversational Context, see and listen to the instructions we injected, not realizing that SearchGPT wrote them,” the researchers said. “Essentially, ChatGPT is prompt-injecting itself.”

But getting an unauthorized prompt to ChatGPT accomplishes little for an attacker without a way to receive the model’s response, which could include sensitive information from the conversation context.

One method to do this involves leveraging ChatGPT’s ability to render Markdown text formatting through its interface, which includes the ability to load remote images from URLs.

Attackers could build a dictionary that maps every letter of the alphabet to a unique image hosted on their server. They could then instruct ChatGPT to load a series of images that correspond to each letter in its response. By monitoring the order of requests to URLs on their web server, attackers could then reconstruct ChatGPT’s response.

This approach faces several hurdles: First, it’s noisy — the user’s chat interface will be flooded with image URLs. Second, before including any URL in its responses, ChatGPT passes them through an endpoint called url_safe that performs safety checks.

This mechanism is designed to prevent malicious URLs from reaching users, either accidentally or through prompt injections, including image URLs in markdown-formatted content. One check that url_safe performs is for domain reputation, and it turns out that bing.com is whitelisted and implicitly trusted.

The researchers also noticed that every web link indexed by Bing is wrapped in a unique tracking link of the form bing.com/ck/a?[unique_id] when displayed in search results. When clicked, these unique Bing tracking URLs redirect users to the actual websites they correspond to.

This finding gave the researchers a way to create an alphabet of URLs that ChatGPT would accept to include in its responses, by creating a unique page for every letter, indexing those pages in Bing, and obtaining their unique bing.com tracking URLs.

The researchers also discovered a bug in how ChatGPT renders code blocks in markdown: Any data that appears on the same line as the code block opening, after the first word, doesn’t get rendered. This can be used to hide content, such as image URLs.

Abusing ChatGPT’s long-term memory for persistence

ChatGPT has a feature called Memories that allows it to remember important information across different sessions and conversations with the same user. This feature is enabled by default and triggers when users specifically ask ChatGPT to remember something or automatically when the model deems information is important enough to remember for later.

Information saved through Memories is taken into account when ChatGPT constructs its responses to users, but also provides attackers a way to save malicious prompts so they get executed in future conversations.

Tying it all together

The Tenable researchers presented several proof-of-concept scenarios showing how these techniques could be combined to launch attacks: from embedding comments in blog posts that would return malicious phishing URLs to users, masked using bing.com tracking URLs, to creating web pages that instruct SearchGPT to prompt ChatGPT to save instructions in its Memories to always leak responses using the Bing-masked URL alphabet combined with the markdown content hiding technique.

“Prompt injection is a known issue with the way LLMs work, and unfortunately, it probably won’t be fixed systematically in the near future,” the researchers wrote. “AI vendors should ensure that all their safety mechanisms (such as url_safe) work properly to limit the potential damage caused by prompt injection.”

Tenable reported its findings to OpenAI, and while some fixes have been implemented, some techniques continue to work. Tenable’s research began in 2024 and was performed mainly on GPT-4, but the researchers confirmed that GPT-5 is also vulnerable to some of these attack vectors.

– Read More

Researchers trick ChatGPT into prompt injecting itself – CSO Online

Hiding malicious instructions in websites

Conversation injection and stealthy data exfiltration

Abusing ChatGPT’s long-term memory for persistence

Tying it all together

You Missed

Whisper Leak uses a side channel attack to eavesdrop on encrypted AI conversations – CSO Online

Runtime bugs break container walls, enabling root on Docker hosts – CSO Online

Kim Kardashian fails bar exam – Legal Cheek

The comprehensive school boy from Newport who founded Pogust Goodhead — before being dramatically ousted – Legal Cheek

Hiding malicious instructions in websites

Conversation injection and stealthy data exfiltration

Abusing ChatGPT’s long-term memory for persistence

Tying it all together

Related Post

You Missed