In the quest to empower everyone to become more data-driven in decision making, rigid ETL and data collection processes of the past have given way to rapid gathering of raw, unstructured and crowd sourced data. Because of these changes, reporting data quality has been declining.

Issues with data are often quickly exposed in self-service visual analytics solutions. Unfortunately, inaccurate data undermines the powerful value of self-service analytics. If your reports can’t be trusted, they won’t get used. Since self-service analytics credibility, adoption and success hinges on accurate data, data quality should be given attention as you implement these solutions.

“Inaccurate data undermines the powerful value of self-service analytics.”

Mission Critical for Automating Decisions

Furthermore, as organizations advance in analytics maturity by adding predictive and prescriptive algorithms to automate decisions, data accuracy can be mission-critical. While most organizations indicate data supports business objectives, according to a recent Experian Data Quality Survey an average of one third of organizations believe their data to be inaccurate or incomplete, undermining their ability to automate decisions.

Where to Begin

Self-Service Analytics Credibility

Please join me in a live webinar that covers establishing a foundation for improving data quality. I’ll be sharing recommendations, introducing a spectrum of available tools, common data quality metrics, gaining support and tips for addressing cultural change. Following that webinar, participants will be provided a link to a white paper on this topic.

Common Issues

The top challenge for data accuracy is still the age-old problem of human data entry error. System migration efforts, application changes and honest mistakes frequently are the root cause of problems. When most of the data originates from within the organization or is acquired via a controlled method, data quality tools, cleansing and correction processes can be applied to address most of these issues. Ideally data entry and system errors should get resolved at the source with validation or added references to master data – not in ETL processes after the fact.

Human error still #1 top data accuracy issue source.

Newer sources of data such as social media are highly prone to human error. When data originates outside the organization, data quality becomes a bit more challenging. Deciding how to decipher or standardize values, maintaining consistent values across sources for traceability, storing original and assumed values, and applying scalable techniques to larger volumes of incoming data in varied formats, generally need to be discussed and decided upon by the consumers of that information.

Getting Users to Care

Data quality is not an exhilarating topic for most business users until it embarrasses or prevents user from accurately reporting or making a decision. Thus you will need to creatively find ways to encourage folks to care. Data catalogs, logical data warehousing, personalized data prep combined with data quality and cleansing service technologies can all be sprinkled into your overall self-service BI strategy. These related solutions with fantastic data quality capabilities should be appealing to reporting users. Helping users find available data for reporting is far more exciting than approaching them with the dirty data issue.

Moving from Unaware to Optimal

Much like self-service BI governance, data quality is an ongoing, continual improvement initiative. You will want to actively monitor and reconcile high priority data sets to validate their compliance with defined accuracy expectations. Many solutions provide basic alerting for assigned data stewards or trigger prescriptive actions when thresholds are not met.

data quality progression

Despite being tied to clear business objectives, most current approaches to data quality in terms of people, processes, and technology are lagging in comparison to analytics initiatives. As of 2016, only 1 in 5 companies is operating at the most sophisticated level of data quality management that includes having a data quality role and executive sponsorship according to Experian research. Clearly there is much room for improvement.

“As of 2016, only 1 in 5 companies is operating at the most sophisticated level of data quality management.”

In all aspects of data management, and especially in reporting to outside entities, there is an expectation for data to be correct. Deliberate negligence cannot be used as a foundation for plausible denial. Data management practices should be proactive.

Handling Data Changes for Usability

For self-service analytics, non-technical users will often need to integrate and enhance data from a wide variety of internal and external sources. To relate disparate data sources, these users will need to make changes, add identifiers or combine fields. As changes are made, different copies of the data might exist. Self-service BI governance and data quality processes will increase data usability for popular data blending scenarios while maintaining a logical repository of personalized data changes, meaning and context.

Spectrum of Data Quality Solutions

Large companies often combine enterprise master data management (MDM), a proven technology that has been too expensive for most organizations to afford, along with niche data quality services and tools. Mid-market groups simply don’t have the time, budget or resources for those approaches. Recently several business-user driven solutions have been entered the market. New enterprise data catalogs and self-service data preparation offerings align to service both data quality and self-service analytics needs.

Additional nuances with self-service BI, human behavior and potential solutions to these challenges will be explored in the upcoming webinar.

Additional Information

Every organization today depends on data to understand its customers and employees, design new products, reach target markets, and plan for the future. For more detailed information on improving self-service analytics governance and data quality, here are several additional resources.