A hot potato: The right to be forgotten (RTBF), also known as Right to Erasure under Europe's GDPR, grants individuals the authority to ask tech corporations to permanently delete their personal data. However, when it comes to LLMs and AI chatbots, technology has yet to provide clear solutions for users who wish to see their digital persona vanish from the world.

A new study conducted by researchers from Data61 Business Unit, which is the division of Australia's National Science Agency specializing in artificial intelligence, robotics, and cybersecurity, seeks to evaluate the implications of the growing popularity of large language models (LLMs) and chatbot-based services on the right to be forgotten (RTBF). The study concludes that technology has surpassed the boundaries set by the existing legal framework.

The right to be forgotten is not limited to Europe's GDPR, as similar laws can be invoked by citizens in Canada (CCPA), Japan (APPI), and other countries. RTBF procedures were primarily designed with internet search engines in mind, making it relatively straightforward for companies like Google, Microsoft, and other tech firms to identify and delete specific data from their proprietary web indexes.

However, when it comes to LLMs, things become significantly more complex. According to the Australian researchers, machine learning-based algorithms are not as straightforward as search engines. Furthermore, determining which personal data has been utilized to train AI models and establishing the attribution of such data to specific individuals becomes exceedingly challenging.

According to the researchers, users can only gain insight into their personal data within these LLM models "by either inspecting the original training dataset or perhaps by prompting the model." However, the companies behind chatbot services may choose not to disclose their training dataset, and engaging with a chatbot does not guarantee that the textual output will provide the precise information sought by users interested in a RTBF procedure.

Furthermore, chatbots have the ability to generate fictional responses, referred to as "hallucinations," making prompt-based interactions an unreliable means of accessing the underlying data within the chatbot. The researchers highlight that LLMs store and process information "in a completely different way" compared to the indexing approach employed by search engines.

These emerging and increasingly popular AI services present new challenges for the right to be forgotten (RTBF). However, it is important to note that LLMs are not exempt from complying with privacy rights. To address this, the researchers propose various solutions for removing data from AI training models, such as the "machine unlearning" SISA technique, Inductive Graph Unlearning, and Approximate Data Deletion, among others.

Major companies currently operating in the LLM industry are also attempting to address the compliance issue of RTBF. OpenAI, likely the most prominent player in modern generative AI services, offers a form for users to request the removal of their personal data from ChatGPT outputs. However, the specific handling of these requests remains unclear.