OpenAI’s Long Term Memory Can Be Maliciously Manipulated

Source: Ars Technica OpenAI’s Long Term Memory Can Be Maliciously Manipulated

OpenAI First Called It A Safety Issue But Have Since Changed Their Tune

A security researcher was playing with OpenAI’s new long-term conversation memory feature and discovered a very concerning flaw.  When he reached out to OpenAI they claimed it wasn’t really a security concern so he created a proof of concept hack which made the company change their mind and start working on a fix.  What he did was modify the long-term conversation memory to convince ChatGPT that he was a 102 year old flat earther from the Matrix, and from then on any questions to OpenAI he asked were answered with that in mind.

If someone can get at your long-term conversation history they could insert whatever they wanted, and forever taint the results you get from your inquiries.  It’s not an easy hack to pull thankfully, and you should be able to set OpenAI to notify you when a new memory has been added, which you should probably pay very close attention to.

On the other hand, it is amusing to think what you could do to someone who depends on ChatGPT or other LLMs to provide the answers to all their questions; far better than a simple rickroll!

So Rehberger did what all good researchers do: He created a proof-of-concept exploit that used the vulnerability to exfiltrate all user input in perpetuity. OpenAI engineers took notice and issued a partial fix earlier this month.

Video News

About The Author

Jeremy Hellstrom

Call it K7M.com, AMDMB.com, or PC Perspective, Jeremy has been hanging out and then working with the gang here for years. Apart from the front page you might find him on the BOINC Forums or possibly the Fraggin' Frogs if he has the time.

Leave a reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Latest Podcasts

Archive & Timeline

Previous 12 months
Explore: All The Years!