For AI model success, utilize MLops and get the data right

We are excited to bring Transform 2022 back in-person July 19 and virtually July 20 – 28. Join AI and data leaders for insightful talks and exciting networking opportunities. Register today!

It’s critical to adopt a data-centric mindset and support it with ML operations 

Artificial intelligence (AI) in the lab is one thing; in the real world, it’s another. Many AI models fail to yield reliable results when deployed. Others start well, but then results erode, leaving their owners frustrated. Many businesses do not get the return on AI they expect. Why do AI models fail and what is the remedy? 

As companies have experimented with AI models more, there have been some successes, but numerous disappointments. Dimensional Research reports that 96% of AI projects encounter problems with data quality, data labeling and building model confidence.

AI researchers and developers for business often use the traditional academic method of boosting accuracy. That is, hold the model’s data constant while tinkering with model architectures and fine-tuning algorithms. That’s akin to mending the sails when the boat has a leak — it is an improvement, but the wrong one. Why? Good code cannot overcome bad data.

Instead, they should ensure the datasets are suited to the application. Traditional software is powered by code, whereas AI systems are built using both code (models + algorithms) and data. Take facial recognition, for instance, in which AI-driven apps were trained on mostly Caucasian faces, instead of ethnically diverse faces. Not surprisingly, results were less accurate for non-Caucasian users. 

Good training data is only the starting point. In the real world, AI applications are often initially accurate, but then deteriorate. When accuracy degrades, many teams respond by tuning the software code. That doesn’t work because the underlying problem was changing real-world conditions. The answer: to increase reliability, improve the data rather than the algorithms. 

Since AI failures are usually related to data quality and data drifts, practitioners can use a data-centric approach to keep AI applications healthy. Data is like food for AI. In your application, data should be a first-class citizen. Endorsing this idea isn’t sufficient; organizations need an “infrastructure” to keep the right data coming. 

MLops: The “how” of data-centric AI

Continuous good data requires ongoing processes and practices known as MLops, for machine learning (ML) operations. The key mission of MLops: make high-quality data available because it’s essential to a data-centric AI approach.

MLops works by tackling the specific challenges of data-centric AI, which are complicated enough to ensure steady employment for data scientists. Here is a sampling: 

  • The wrong amount of data: Noisy data can distort smaller datasets, while larger volumes of data can make labeling difficult. Both issues throw models off. The right size of dataset for your AI model depends on the problem you are addressing. 
  • Outliers in the data: A common shortcoming in data used to train AI applications, outliers can skew results. 
  • Insufficient data range: This can cause an inability to properly handle outliers in the real world. 
  • Data drift: Which often degrades model accuracy over time. 

These issues are serious. A Google survey of 53 AI practitioners found that “data cascades—compounding events causing negative, downstream effects from data issues — triggered by conventional AI/ML practices that undervalue data quality… are pervasive (92% prevalence), invisible, delayed, but often avoidable.”

How does MLOps work?

Before deploying an AI model, researchers need to plan to maintain its accuracy with new data. Key steps: 

  • Audit and monitor model predictions to continuously ensure that the outcomes are accurate
  • Monitor the health of data powering the model; make sure there are no surges, missing values, duplicates, or anomalies in distributions.
  • Confirm the system complies with privacy and consent regulations
  • When the model’s accuracy drops, figure out why

To practice good MLops and responsibly develop AI, here are several questions to address: 

  • How do you catch data drifts in your pipeline? Data drift can be more difficult to catch than data quality shortcomings. Data changes that appear subtle may have an outsized impact on particular model predictions and particular customers.
  • Does your system reliably move data from point A to B without jeopardizing data quality? Thankfully, moving data in bulk from one system has become much easier, as tools for ML improve.
  • Can you track and analyze data automatically, with alerts when data quality issues arise? 

MLops: How to start now

You may be thinking, how do we gear up to address these problems? Building an MLops capability can begin modestly, with a data expert and your AI developer. As an early days discipline, MLops is evolving. There is no gold standard or approved framework yet to define a good MLops system or organization, but here are a few fundamentals:

  • In developing models, AI researchers need to consider data at each step, from product development through deployment and post-deployment. The ML community needs mature MLops tools that help make high-quality, reliable and representative datasets to power AI systems.
  • Post-deployment maintenance of the AI application cannot be an afterthought. Production systems should implement ML-equivalents of devops best practices including logging, monitoring and CI/CD pipelines which account for data lineage, data drifts and data quality. 
  • Structure ongoing collaboration across stakeholders, from executive leadership, to subject-matter experts, to ML/Data Scientists, to ML Engineers, and SREs.

Sustained success for AI/ML applications demands a shift from “get the code right and you’re done” to an ongoing focus on data. Systematically improving data quality for a basic model is better than chasing state-of-the-art models with low-quality data.

Not yet a defined science, MLops encompasses practices that make data-centric AI workable. We will learn much in the upcoming years about what works most effectively. Meanwhile, you and your AI team can proactively – and creatively – devise an MLops framework and tune it to your models and applications. 

Alessya Visnijc is the CEO of WhyLabs


Welcome to the VentureBeat community!

DataDecisionMakers is where experts, including the technical people doing data work, can share data-related insights and innovation.

If you want to read about cutting-edge ideas and up-to-date information, best practices, and the future of data and data tech, join us at DataDecisionMakers.

You might even consider contributing an article of your own!

Read More From DataDecisionMakers

Repost: Original Source and Author Link


The data economy: How AI helps us understand and utilize our data 

This article is part of a Technology and Innovation Insights series paid for by Samsung. 

Similar to the relationship between an engine and oil, data and artificial intelligence (AI) are symbiotic. Data fuels AI, and AI helps us to understand the data available to us. Data and AI are two of the biggest topics in technology in recent years, as both work together to shape our lives on a daily basis. The sheer amount of data available right now is staggering and it doubles every two years. However, we currently only use about 2 percent of the data available to us. Much like when oil was first discovered, it is taking time for humans to figure out what to do with the new data available to us and how to make it useful.

Whether pulled from the cloud, your phone, TV, or an IoT device, the vast range of connected streams provide data on just about everything that goes on in our daily lives. But what do we do with it?

Earlier this month, HARMAN’s Chairman Young Sohn sat down with international journalist Ali Aslan in Berlin, Germany at the “New Data Economy and its Consequences” video symposium held by Global Bridges. Young and Ali discussed the importance of data, why AI without data is useless, and what needs to be considered when we look at the ethical use of data and AI — including bias, privacy, and security.


Unlike humans, technology and data are not inherently bias. As the old adage goes — data never lies. Bias in data and AI comes into play when humans train an AI algorithm or interpret data. Much of what we are consuming is influenced based on where the data is coming from and what data is going into the system. Understanding and eliminating our bias are essential to ensuring a neutral algorithm and system.

Controlling data access and permissions are a key first step to remove bias. Having a diverse and inclusive team when developing algorithms and systems is essential. Not everyone has lived the same experiences and backgrounders. Diversity in both can help curb biases by providing different ways of interpreting data inputs and outputs.


Permission and access are paramount when we look at the privacy aspect of data. Privacy is extremely important in our increasingly digital society. As such, consumers should have a choice at the beginning of a relationship with an organization and be asked whether they want to opt-in, rather than having to opt-out. GDPR has been a good first step in helping to protect consumers in regards to the capture and use of their data. While GDPR has many well-designed and important initiatives, the legislation could be more efficient.


Whereas data privacy is more of a concern to consumers and individuals, data security has become a global concern for consumers, organizations, and nation-states.

It seems like every day we are reading about another cyber-attack or threat that we should be aware of. Chief among these concerns are the influx of ransomware attacks. Companies and individuals are paying increasingly large amounts of money to bad actors in an attempt to mitigate risk, attention, and embarrassment. These attacks are being carried out by individuals, collectives, and even nation-states in an attempt to cripple the systems of enemies, gather classified information, or garner capital gains.

So how do we trust our data and information is safe and what can we do to be better protected? While there may be bad actors using technology and data for their own nefarious devices, there are also many positive uses for technology. The amount of education and investments being made in the cybersecurity space have helped many organizations to train employees and invest in technologies that are designed to prevent cybercrime at the source — human error. And while we may not be able to stop all cybercrime, we are making progress.

Data and AI for good

While data — both from a collection and storage viewpoint — and AI have gotten negative press around biases, privacy, and security, both can also be used to do an immense amount of good. For example, both data and AI have been crucial in the biomedical and agtech industries. Whether it’s COVID-19 detection and vaccine creation or the creation of biomes and removal of toxins in soil, data and AI have incredible potential. However, one cannot move forward without the other. A solid and stable infrastructure and network are also needed to ensure that we can make use of the other 98 percent of the global data available.

VB Lab Insights content is created in collaboration with a company that is either paying for the post or has a business relationship with VentureBeat, and they’re always clearly marked. Content produced by our editorial team is never influenced by advertisers or sponsors in any way. For more information, contact

Repost: Original Source and Author Link