Inconvenient Truths of Data Quality

Data is an asset. But it is also a liability. Data quality issues, when not managed correctly, would become a critical enterprise risk. KPMG’s 2016 Global CEO Outlook reported that 84% of CEOs were concerned about the quality of data that they based their decisions on. However, there are 7 inconvenient truths of data quality that if ignored, it will make companies’ data quality improvement initiatives much more difficult to generate impact. Companies should proactively address the fundamental root causes and design necessary changes accordingly.

Data quality improvement projects are one of the top 3 most popular client requests that I have received in my 20+ years of technology strategy consulting career at McKinsey and Accenture. Across the world, and across all industries, I don’t think I have ever met a client who would declare that their data quality is good enough. Instead, clients always complain that their data is not clean enough, the data quality initiatives haven’t made enough progress, and they wonder if there are things that they can do differently to fix the data quality problem, once and for all.

Why is fixing data quality so difficult? Typing “Data Quality” into Google search bar, you would get over 7 billion hits. Out of these search results, one can see that a lot of companies have come up with perspectives and technology solutions attempting to improve the situation. But why do we still have the problem?  In my humble opinion, there are a number of “inconvenient truths” for data quality problems. Until those root causes are fixed, it will be difficult for companies to make fundamental progress to address data quality issues.

 

Inconvenient Truth No. 1 – Data will always be dirty

According to U.S. Bureau of Labor Statistics, in January 2018, the median number of years that wage and salary workers had been with their current employers was 4.2 years. This means roughly 25% of people will change jobs every year.  People move, homes and jobs. Companies merge or get acquired. Customers don’t always update their service providers with their information changes. Even when customers do call to update the information, there is no guarantee that the updated customer information will cascade throughout the enterprise, updating all the systems that might have customer information stored (e.g., sales database, billing departments, and services departments). This problem will get even worse if external partners are involved. Keeping data consistent across a network of companies is a daunting task.  Furthermore, during analytics efforts, data engineers might fix the data quality issues to get to the desired results. But those fixed data sometimes never made back to the original data source and the data fix efforts became one time wasted effort.

Data will never be 100% clean. There will likely always be wrong data, inconsistent data, and missing data.

 

Inconvenient Truth No. 2 – No one really cares that much about Data Quality

Analytics is sexy. Artificial Intelligence is sexy. Insight generated from analytics is amazing and eye opening. New sales and revenue growth generated out of analytical insights is the best. Data quality, unfortunately, is not sexy. While many companies understand the criticality of data in AI and Analytics and data-driven decision making, they often are not willing to make the fundamental changes that are required to create a quality data foundation. Shortcuts can always be found to get around data quality issues and given that you will never have enough data, good enough insights can still generate business outcomes. Thus, data quality initiatives get deprioritized and postponed. Those near term actions and gains from analytics efforts often distract companies from the long term commitment required to fix the data quality problems.

 

Inconvenient Truth No. 3 – Data Quality issues reflect the way a company functions

Data is everywhere. Data quality issues are often rooted in the way that a company is organized and operated. For example:

  • Inconsistent data definitions across different departments or geographies are often some of the largest issues for data quality. But this is typically driven by the company’s organization structure and inherent culture. If a company tends to operate in a siloed fashion, each BU or geography or product group will generate its own data definition and push for its own systems. The inconsistent data definitions and data quality issues resulted won’t be solved until the different BUs or geographies are willing to work together and decide on a standard. This is the United Nations issue of data quality.
  • Given that data is truly an enterprise asset, people from different groups must be willing to collaborate to make changes. But if the company’s culture doesn’t always encourage collaboration, this can be a tall order for the data quality teams to overcome on their own.
  • As I discussed above, data quality requires long term commitment. If a company is more inclined to focus on quick wins and near term actions, data quality issues often get deferred and deprioritized.

Given that data quality issues reflect the way a company functions, to fix the data quality issues, companies must make fundamental changes to how they are organized and operating.  For example, to solve the data definition standardization issue, ING (a global financial institution) established a common language called “ING Esperanto”. The ING Esperanto aims to become the global glossary that describes the business terms that ING uses frequently within all the global entities and creates one consistent language throughout ING. The ING Esperanto started with a glossary of more than 1000 agreed upon business terms.

 

Inconvenient Truth No. 4 – IT alone can’t solve the data quality issues

Even though many companies have hired Chief Data Officers and are now asking Chief Data Officers to take charge of solving the data quality issues, the perception of data quality being a technology problem still persists. Unfortunately, data quality is not a technology issue. For example, IT can’t force the front line to change sales processes to capture customer information. IT can’t force the customer services department to validate customer information when customers call in. The data quality problem must be solved both in the business, and in IT. As often said – “garbage in, garbage out”, fixing data quality issues would require significant business process changes. Those changes must be made by the business, not by IT. When I was in Asia, to fix the customer data issues, one of my life insurance clients would issue a $50 fine for every agent who didn’t put in correct customer information when a new insurance policy was sold. Imagine IT trying to launch such a company policy?

 

Inconvenient Truth No.5 – There is no silver bullet technical solution for data quality

Reading through the 7 billion results from Google search of “Data Quality”, one would think that there are a number of silver bullets solving the data quality issues. For example, Master Data Management solutions could solve the data quality issues. Data Lake could solve the data quality issues. Unfortunately, as described above, data quality issues require a holistic set of tools, processes, and cultural changes. No single technology solution will be able to address all the issues associated with data quality. But the software and consulting vendors don’t always tell customers that. To make the sales, they would market their solutions as the magic silver bullet.

Technology might be able to solve 30% of the data quality issues. The remaining 70% must be resolved with people and process changes.

 

Inconvenient Truth No. 6 – It’s a journey, not a marathon nor a sprint

Data quality issues are a set of everlasting challenges for most companies. The issues will not disappear overnight and will most likely always persist. Fixing data quality issues would require companies to create a data quality muscle, similar to compliance, risk, or quality, and work on this continuously. Data quality improvement is not a project, nor a transformation or an initiative. It is a discipline that must be built and exercised on.

 

Inconvenient Truth No. 7 – Data is as big a liability as an asset

Data is the new oil – this assertion has been seen everywhere. True, without data, companies can’t do analytics, artificial intelligence, data-driven decision making, etc. However, data is as much a liability as it is as an asset. Information security issues, data privacy issues, AI bias issues, compliance issues, and inaccurate analytics resulted from bad data quality, those are just some of the ways for data to become a liability, sometimes before data even becomes an asset.  Unfortunately, a lot of times, compliance and legal issues are much stronger impetus for change. For example, banks have made significant strides in data quality issues because of the various regulations such as BASEL II/III and CCAR.

Now that we have gone through the 7 inconvenient truths, the good news is that there are some untapped opportunities for companies to address the problems. Companies should consider the following fundamental changes:

 

  • Make this a board-level issue

Every public company’s annual report discusses potential risks to the company. Data quality must be one of the risks that are explicitly discussed. The Board of Directors should periodically review the current state of data management, both from the asset and liability points of view, to understand the progress being made and additional steps that must be undertaken. This level of scrutiny will bring the enterprise attention required and initiate the fundamental changes mandatory to address data quality issues.

 

  • Take a just-in-time analytics-back approach

Data only becomes an asset when it is being analyzed and insights are being generated and leveraged. Companies should take a just-in-time approach to fix underlying data quality and simultaneously use the data that is being fixed for analytical purposes. This creates a flywheel where the more data quality gets improved, the better analytics insights get generated. And the better data analytics insights get to, the more attention is paid to address underlying data quality issues. This flywheel effect will ensure companies focus on the most burning issues and also create the momentum for departments to fix the underlying data issues.

 

  • Don’t let a good crisis go wasted

“Don’t waste a crisis” – it’s a cliché that has been attributed to a number of people, including Winston Churchill. However, this cliché still works for many problems. It’s true – regulation changes, for example, will mobilize companies to do whatever it takes to make changes happen. Don’t waste a crisis and leverage those regulatory shifts as opportunities to install fundamental changes that will improve data quality over time. For example, a number of my clients leveraged IFRS, GDPR, and CCPA (California Consumer Privacy Act) to address their underlying customer data quality and privacy control issues.

 

  • Remember the underlying technology architecture

While data quality is not just an IT problem, often, without making changes to the underlying technology architecture, data quality issues will always remain. Technology band-aids can improve the quality of point-in-time data snapshots. But the underlying issues will persist. For companies to achieve everlasting data quality improvements, they must consider underlying technology architecture modifications and make investments on those changes.

 

  • Align people on data quality metrics

Airbnb recently announced that it will factor in metrics including guest safety when awarding employee bonuses, as the company grapples with how best to address crime and other problems at rentals listed on its platform. The same principle can be applied to data quality. Companies should consider adding data quality metrics to involved stakeholders’ metrics and incentive systems to raise the importance of data quality.

 

 

Data is an asset. But it is also a liability. Data quality issues, when not managed right, would become a critical enterprise risk. KPMG’s 2016 Global CEO Outlook reported that 84% of CEOs were concerned about the quality of data that they based their decisions on. With AI and advanced analytics, data quality is a must-have before companies can capture such opportunities. The inconvenient truths and associated action items listed in this blog can be a great way for companies to make progress on this critical issue.

Copyright © 2023 Parker Shi. All rights reserved.

Share on Twittter
Share on LinkedIn

Related Articles: