Latest Tech News

Synthetic data has its limits – because human data can help prevent the collapse of the AI ​​model


Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn more


My, how quickly the tables turn in the technological world. Just two years ago, AI was being hailed as the “next transformation technology to rule them all.” Now, instead of reaching Skynet levels and taking over the world, AI is, ironically, degrading.

Once the harbinger of a new era of intelligence, AI is now collapsing on its own code, struggling to live up to the brilliance it promised. But why exactly? The simple fact is that we are starving AI of the one thing that makes it truly intelligent: human-generated data.

To feed these data-hungry models, researchers and organizations are increasingly turning to synthetic data. While this practice has long been a basis AI developmentwe are now crossing into dangerous territory by over-relying on it, causing a gradual degradation of AI models. And this is not just a minor concern ChatGPT produce sub-par results – the consequences are much more dangerous.

When AI models are trained on the results generated from previous iterations, they tend to propagate errors and introduce noise, leading to a decrease in output quality. This recursive process turns the familiar cycle of “garbage in, garbage out” into a self-perpetuating problem, significantly reducing the effectiveness of the system. As the AI ​​moves away from intelligence as human and accuracy, not only undermines performance, but also raises critical concerns about the long-term viability of relying on self-generated data for continued AI development.

But this is not just a degradation of technology; it is a degradation of the reality, identity and authenticity of data – which pose serious risks to humanity and society. The ripple effects could be profound, leading to an increase in critical errors. Since these models lose accuracy and reliability, the consequences could be terrible – think of a medical misdiagnosis, financial losses and even life-threatening accidents.

Another major implication is that the development of AI could stall completely, leaving AI systems unable to ingest new data and essentially becoming “stuck in time”. This stagnation not only impedes progress, but also traps AI in a cycle of diminishing returns, with potentially catastrophic effects on technology and society.

But, in practice, what can companies do to ensure the safety of their customers and users? Before answering this question, we need to understand how it all works.

When a model collapses, trust goes out the window

The more AI-generated content spreads online, the faster it will infiltrate the datasets and, subsequently, the models themselves. And it’s happening at an accelerating rate, making it increasingly difficult for developers to filter out anything that isn’t pure, human-created training data. The fact is that the use of synthetic content in training can trigger a harmful phenomenon known as “model collapse” or “autophagy disorder model (MAD).”

Model collapse is the degenerative process in which AI systems progressively lose their grip on the true underlying data distribution they are meant to model. This often happens when the AI ​​is trained recursively on generated content, which leads to a number of problems:

  • Loss of nuance: Models begin to forget outlier data or less represented information, crucial for a complete understanding of any dataset.
  • Reduced diversity: There is a noticeable decrease in the diversity and quality of the products produced by the models.
  • Amplification of prejudices: Existing biases, particularly against marginalized groups, may be exacerbated as the model overlooks nuanced data that could mitigate these biases.
  • Generating nonsensical outputs: Over time, models can start producing outputs that are completely unrelated or meaningless.

A case in point: A study published in Nature highlighted the rapid degeneration of language models recursively trained on AI-generated text. At the ninth iteration, these models were found to produce entirely irrelevant and meaningless content, demonstrating the rapid decline in data quality and model usefulness.

Saving the Future of AI: Steps Businesses Can Take Today

Business organizations are in a unique position to shape the future of AI responsibly, and there are clear and actionable steps they can take to keep AI systems accurate and reliable:

  • Invest in data sourcing tools: Tools that track where each piece of data comes from and how it changes over time give companies confidence in their AI inputs. With clear visibility into the origins of data, organizations can avoid feeding models unreliable or biased information.
  • Implement AI-powered filters to detect synthetic content: Advanced filters can catch it Generated by AI or low-quality content before it slips into training datasets. These filters help ensure that models learn from authentic, human-created information rather than synthetic data that lacks real-world complexity.
  • Partnership with trusted data providers: Strong relationships with verified data providers give organizations a steady supply of high-quality, authentic data. This means that AI models receive real, nuanced information that reflects real scenarios, increasing performance and relevance.
  • Promote digital literacy and awareness: By educating teams and customers about the importance of data authenticity, organizations can help people recognize AI-generated content and understand the risks of synthetic data. Awareness of responsible data use fosters a culture that values ​​accuracy and integrity in AI development.

The future of AI depends on responsible action. Businesses have a real opportunity to keep AI grounded in accuracy and integrity. By choosing real, human-sourced data over shortcuts, prioritizing tools that capture and filter out low-quality content, and encouraging awareness around digital authenticity, organizations can put AI on a better path confident and smart. We focus on building a future where AI is both powerful and truly beneficial to society.

Rick Song is the CEO and co-founder of Person.

Data Decision Makers

Welcome to the VentureBeat community!

DataDecisionMakers is where experts, including technical people doing data work, can share data-related insights and innovation.

If you want to read about cutting-edge ideas and up-to-date information, best practices, and the future of data and data technology, join us at DataDecisionMakers.

You can also consider it contribute an article of your own!

Read more from DataDecisionMakers



https://venturebeat.com/wp-content/uploads/2024/12/a-photo-of-a-factory-floor-with-multiple_smWxvYaVQ0SXIgUYy4sa6g_wWO91VftSACaTLVLgS6JTA-transformed.jpeg?w=1024?w=1200&strip=all
2024-12-14 20:05:00

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button