Recently, security researcher Johann Rehberger uncovered a vulnerability in ChatGPT’s memory feature, which allowed attackers to store false information and harmful instructions in a user’s long-term memory settings. Despite this discovery, OpenAI initially dismissed it as a safety issue rather than a security threat.
Refusing to back down, Rehberger demonstrated how this vulnerability could be exploited by creating a proof-of-concept (PoC) that extracted all user input continuously. This caught OpenAI’s attention, prompting them to release a partial fix earlier this month.
The Vulnerability Exploited Memory Features
The flaw involved ChatGPT’s long-term memory feature, introduced in February and expanded in September. This feature allows ChatGPT to remember details like a user’s preferences and past conversations, making future interactions smoother. However, Rehberger found that attackers could abuse this feature through indirect prompt injection—a technique that makes the AI follow instructions from untrusted sources such as emails or blog posts.
Using this method, Rehberger demonstrated how he could manipulate ChatGPT into permanently storing false information. For instance, he made the AI believe a user was 102 years old, lived in a fictional world, and believed Earth was flat. These fabricated details then influenced all future conversations.
The attack didn’t stop there. Rehberger also showed how malicious files hosted on platforms like Google Drive or Bing could be used to plant these false memories, making the flaw a real threat.
OpenAI’s Response and Ongoing Risks
Rehberger reported the issue to OpenAI in May, but the company initially closed the case. A month later, after submitting a more detailed report and PoC, OpenAI engineers took action. His PoC revealed that by tricking ChatGPT into viewing a malicious web link, all user interactions could be copied to a server controlled by the attacker. This was especially concerning because the data exfiltration persisted across multiple sessions.
While OpenAI has fixed part of the problem by preventing memory abuse for data exfiltration, Rehberger noted that prompt injections can still be used to plant long-term false information.
Staying Safe
To avoid these types of attacks, users should be cautious when new memories are added during sessions and regularly review stored memories for anything unusual. OpenAI offers tools for managing and reviewing these memories, but the issue of prompt injections still lingers.
Stay informed on security updates and other tech insights at brightmindai.com!
Read about: Is your Job safe from AI
Can AI generate better ideas than HUMAN?
I can’t focus on my studies-mobile scrolling-lazy and unmotivated
Can AI Help Fix Political Divides? Insights from Duke Professor Christopher Bail
Scientists solved the mystery of how the great pyramids of Egypt were constructed
0 Comments