Researchers make AI models ‘forget’ data

Researchers of Tokyo University of Science (TUS) has developed a method to enable large-scale AI models to selectively “forget” certain classes of data.
Advances in AI have provided tools capable of revolutionizing domains ranging from healthcare to autonomous driving. However, as technology advances, so do its complexities and ethical considerations.
An example of large-scale pre-trained AI systems, such as OpenAI’s ChatGPT and clip (contradictory language-image pre-training), has reshaped expectations for machines. These highly generalist models, capable of handling a wide range of tasks with consistent precision, have been widely adopted for both professional and personal use.
However, such versatility comes at a heavy price. Training and running these models requires an incredible amount of energy and time, which raises sustainability concerns, as well as requiring sophisticated hardware significantly more expensive than standard computers. Compounding these issues is that generalist tendencies hinder the efficiency of AI models when applied to specific tasks.
For instance, “in practical applications, classification of all types of object classes is rarely necessary,” explains Associate Professor Go Erie, who led the research. “For example, in an autonomous driving system, it would be sufficient to recognize a limited class of objects such as cars, pedestrians and traffic signs.
“We don’t need to identify food, furniture or animal species. Retaining classes that do not need to be identified can reduce overall classification accuracy, as well as cause operational disadvantages such as wastage of computational resources and the risk of information leakage.”
One possible solution lies in training models to “forget” redundant or redundant information – streamlining their processes to focus solely on what is needed. While some existing methods already meet this requirement, they adopt a “white-box” approach where users have access to the model’s internal architecture and parameters. Often, however, users get no such visibility.
“Black-box” AI systems, which are more common due to commercial and ethical restrictions, hide their internal mechanisms, making traditional forgetting techniques impractical. To address this gap, the research team turned to derivative-free optimization—an approach that removes reliance on the model’s inaccessible inner workings.
Forget and move on
The study, to be presented at the Neural Information Processing Systems (NeurIPS) conference in 2024, presents a method known as “black-box forgetting.”
The process modifies the input prompts (text instructions given to the model) in iterative rounds to make the AI progressively “forget” certain classes. Associate Professor Irie collaborated on the work with co-authors Yusuke Kuwana and Yuta Goto (both from TUS), as well as Drs. Takashi Shibata NEC Corporation.
For their experiments, the researchers targeted CLIP, a vision-language model with image classification capabilities. The method they developed is built on the Covariance Matrix Adaptation Evolution Strategy (CMA-ES), an evolutionary algorithm designed to optimize solutions step by step. In this study, CMA-ES was used to evaluate and improve the prompts given to CLIP, which ultimately suppress its ability to classify specific image categories.
As the project progressed, challenges arose. Existing optimization techniques struggled to maximize the large number of target categories, leading the team to devise a novel quantification strategy called “latent reference sharing”.
This approach breaks the implicit context—the representation of information generated by the prompt—into smaller, more manageable pieces. By allocating certain elements to a single token (word or letter) while reusing others across multiple tokens, they dramatically reduced the complexity of the problem. Crucially, this made the process computationally tractable even for extensive forgetting applications.
Through benchmark tests on multiple image classification datasets, the researchers validated the effectiveness of black-box forgetting—achieving the goal of making CLIP “forget” about 40% of target classes without direct access to the AI model’s internal architecture.
This research represents the first successful attempt to induce selective forgetting in a black-box vision-language model, showing promising results.
Benefits of helping AI models forget data
Beyond its technical ingenuity, this innovation holds significant potential for real-world applications where task-specific precision is paramount.
Simplifying models for specific tasks can enable them to run faster, more resource-efficient and on less powerful devices – accelerating the adoption of AI in areas previously thought impossible.
Another major use lies in image generation, where forgetting entire categories of visual context can prevent the model from inadvertently generating unwanted or harmful content, be it offensive content or misinformation.
Perhaps most importantly, this method addresses one of AI’s biggest ethical problems: Privacy.
AI models, especially at large scale, are often trained on large datasets that may inadvertently contain sensitive or outdated information. Requests to remove such data – especially in light of laws advocating the “right to be forgotten” – pose significant challenges.
Retraining an entire model to exclude problematic data is expensive and time-intensive, yet the risks of leaving it unaddressed can have far-reaching consequences.
“Retraining large-scale models consumes a large amount of energy,” notes Associate Professor Irie. “‘Selective forgetting,’ or so-called machine unlearning, may provide an efficient solution to this problem.”
These privacy-focused applications are particularly relevant in high-stakes industries Healthcare And MoneyWhere sensitive data is central to operations.
As the global race to advance AI accelerates, Tokyo University of Science’s black-box forgetting approach charts an important path forward—not only by making the technology more adaptable and efficient but also by adding significant safety for users.
While the potential for abuse remains, methods such as selective forgetting demonstrate that researchers are proactively addressing both ethical and practical challenges.
See also: Why QwQ-32B-Preview is a rational AI to watch

Want to learn more about AI and big data from industry leaders? Check out AI and Big Data Expo Taking place in Amsterdam, California and London. Co-located with other leading events including comprehensive events Intelligent Automation Conference, BlockX, Digital Transformation WeekAnd Cyber Security and Cloud Expo.
Explore other upcoming enterprise technology events and webinars hosted by TechForge here.
https://www.artificialintelligence-news.com/wp-content/uploads/2024/12/ai-artificial-intelligence-machine-learning-research-privacy-ethics-development.jpg