Quality data equals truth sets, and nothing less


Quality data equals truth sets, and nothing less

Today’s consumer packaged goods (CPG) landscape bears little resemblance to that of 1990. There are currently 9,000 more products in U.S. grocery stores than there were back then, but the average retail store is almost 7,500 square feet smaller. Organic, gluten-free, heart-healthy, and sustainable products are everywhere in today’s stores because consumers demand them. And with a new product hitting the shelves approximately every two minutes, the list is growing.

AI is enabling companies to better understand how consumers are shopping, why they shop, and, most importantly, predict what consumers will buy in the future. This is fundamentally shifting how companies explore product development cycles, pricing models, and understandings of how to change the minds of fickle consumers.

But AI doesn’t operate in isolation; far from it. Instead, it requires clean inputs to achieve clean outputs. To deliver consumers the right products at the right time, businesses need to curate millions of data points. Data that might be considered “good enough” requires human interjection to make corrections, which thereby introduces the chance of more problems and inefficiencies.

Few businesses realize the high costs of ignoring “adequate” data or take proactive measures to treat the problem by improving the data quality. Worse still, correcting data after it has been created can be 10 times more costly than implementing upstream controls at the point of data entry. A failure to invest in the right data on the front end inevitably leads to unexpected costs on the back end.

One critical starting point is identifying what outputs will be required of the base data and what decisions will be made on those outputs. For example, are you marketing products to specific demographics within a population or targeting a particular group for a product release? Next, you need to have concrete metrics for what qualifies as clean data, e.g., the accuracy, completeness, and aggregation of the data. And if there are issues with the data, what are the most important attributes that will need correcting? What are the inherent consequences and risks? These are all questions a tech team needs to be able to answer, particularly if they’re deploying the data in an AI environment. Ideally, they’ll start with clean data and 100% confidence in how it is expressed.

Marketers are increasingly turning to an AI-first strategy because it enables not just the management of data but it ties to a strategy. These strategies are classically driven by turning increasing volumes of big data into smaller, local, manageable, and personal data sets that can be easily activated. Cultivating a data driven organization has become incredibly important to enabling businesses to completely disrupt their own product innovation pipeline, and pivot from episodic measurement of marketing outcomes to agile measurement that forces continuous improvement. These capabilities enable tech-forward CPG companies to move from chasing market share to capturing a larger share of their target consumer’s life.

An increasing percentage of data is created passively, as consumers interact with technology; and truly big, passively generated data has many pitfalls, such as selection bias, misattribution, compliance, and missing data. In many instances, the burden of managing data quality has moved from data generator to end-user. AI unlocks the value of big data but requires data expertise to understand what pitfalls are being solved for, and truth sets against which to train and benchmark. Without the data expertise and the truth set, there is a great risk of applying an inefficient algorithm that ultimately generates misleading insight and incorrect predictions. In today’s fast-moving world of consumer preferences and retail dynamics, no CPG company can afford a misstep in their product innovation, distribution, pricing strategy, or promotions.

We need to think about data as a truth set and use technology and real people to calibrate those truth sets. Increasingly, the marketplace will look for trust in data and transparency in how it is collected, cleaned, codified, aggregated, permissioned, and ultimately used.

Fine wine gets better with age; data does not. Data quality and transparency in its collection and development should never be optional—making the deployment of data scientists a requirement. Identify problems and take proactive measures to better manage your data within a larger data strategy—or spend an exorbitant amount of time and money rectifying your mistakes.

This article was originally published on