Data-driven organizations that empower everyone with self-service analytics are more vulnerable to classic shadow IT pains – unmanageable data sprawl, reporting inaccuracies, governance, security, regulatory, compliance, and privacy gaps. Don’t be naive. Databases quietly hidden under analysts’ desks are literally keeping the lights on in numerous companies. Do you know what data is actually being used to make mission critical decisions? Where did that data come from? Is it accurate? Did anyone make changes? What did they do to the data? Can you trust them?

Dealing with Daily Data Issues

Data Preparation Governance

The struggle for analysts is real and it is not a cheap problem. According to an article in Harvard Business Review, IBM estimated of the yearly cost of poor quality data in the US alone to exceed $3.1 trillion dollars.

Yearly cost of poor quality data in the US alone to exceed $3.1 trillion dollars

The cited top reasons why bad data becomes a big expense is because analysts, decision makers, knowledge workers, data scientists, and many other people in an organization spend a lot of time dealing with data issues on a daily basis. Here are highlights from IBM’s research on time-wasting efforts dealing with data issues over and over and over again.

  • 50% — the amount of time that knowledge workers waste in hidden data factories, hunting for data, finding and correcting errors, and searching for confirmatory sources for data they don’t trust.
  • 60% — the estimated fraction of time that data scientists spend cleaning and organizing data, according to CrowdFlower.
  • 75% — an estimate of the fraction of total cost associated with hidden data factories in simple operations, based on two simple tools, the so-called Friday Afternoon Measurement and the “rule-of ten.”

Responsibly Empowering the Masses

When enabling and managing self-empowered data preparation and reporting solutions, governance should be a high priority. For organizations in highly regulated industries—financial services, pharmaceutical or biotechnology, and energy—effective data management solutions for supporting legal and regulatory compliance, mitigating risk, and improving efficiency are simply not negotiable.

If you’d like to better understand how to recognize and mitigate common self-service data preparation issues, please join Datawatch and me in a webinar on November 30th. We will share practical self-service data preparation governance considerations, tips, and guidelines to balance agility with the enterprise need for data governance. These tips will be applicable to any solution even though we will be showcasing how Datawatch handles it.

The market and legal climate for data is complex and ever-changing, especially when it comes to personal data legislation such as the General Data Protection Regulation (GDPR). Data ownership, data usage consent, ethical, compliance, legislative, and privacy policies need to be thoroughly researched and clearly understood. Processes for recording data collection consent, building an inventory of assets, organizing, classifying, licensing, governing, securing, distributing and auditing data should be in place. There needs to be a careful balance between data risks and business value.

Self-Service Data Governance

To comply with GDPR or similar legislation, you’ll need visibility into data workflows and tracking of data lineage throughout the entire data life-cycle. Be ready to report on data sources, usage, access, changes and data transfer destinations. Sensitive data might require encryption of data at rest and in motion, de-identification, data masking and higher levels of aggregation.

Changing Attitudes towards Self-Service Analytics

Over the past twenty years, attitudes towards self-service analytics have changed. From 2000 to 2010, there was a dominant preference for tightly controlled reporting environments. Strong control and governance although great for IT was problematic for the business. Approved, official reports from IT prepared data sources could take weeks or months to build. To function day-to-day, the business could not wait. Thus, a handful of tech-savvy data analysts developed off the radar, Shadow IT efforts. These projects usually were created inefficiently with numerous Excel spreadsheets or rogue Access databases that contained VBA, SQL, copies of imported data, and linked tables to disparate official and unofficial data sources.

Excel Hell

Image source: Carola Lissel

When modern self-service visual analytics and data preparation tools entered the market, reporting approaches shifted from highly controlled IT-led reporting operations to loosely understood, widely distributed business-led initiatives. This laissez-faire attitude dominated the market for approximately five years. The honeymoon ended after the speed and ease of these offerings replaced Excel hell with a new set of challenges a year or two after adoption.

In my professional opinion, I suspect the business never fully understand the technical complexity and nuances of enterprise reporting across numerous, constantly changing data sources regardless of how easy it was to create a report. The extreme volumes of data can also rapidly become overwhelming for non-technical report authors. These issues are not easy for the experts to conquer despite what the friendly sales-rep shows you in a demo. Previously I wrote an extremely popular article on that topic, “Did you Buy a Self-Service BI Fantasy?”.

data preparation governance

If the business merely installs self-service analytics tools with no governance in place and invites the masses to join in on the fun, they usually end up with a costly data mess to clean up. No matter how tempting it seems…don’t do it. Enterprise-wide reporting initiatives need a little governance to succeed.

Governance is a Critical Success Factor

Today we are finally seeing self-service analytics sanity return. Current attitudes towards self-service analytics value a reasonable balance between self-service reporting agility and central governance. They also include key data stewardship features to assign data quality accountability.

Fundamental concepts of self service data preparation governance

You’ll want to provide controlled data access to everyone, everywhere via a “one-stop-shop” of certified, curated and raw data sets. Personalization, search and innovative machine learning-powered capabilities simplify finding, transforming and using data that has been shared and made accessible within a governed self-service data preparation ecosystem.

Datawatch Monarch Swarm

Datawatch Monarch Swarm Logical Architecture

Self-service data prep governance capabilities include but are not limited to the following key capabilities:

  • Data Prep: Simple, point-and-click data acquisition, transformation, blending and export capabilities. No coding required. Anyone can use it.
  • Data Marketplace: Creates a social network of certified curated and raw data sets, with controls and limitations defined for each individual.
  • Intuitive Search: Search cataloged data, metadata and data preparation models indexed by user, type, application and data values to quickly find information.
  • Crowdsourcing: Leverage user ratings, recommendations, discussions, comments and data source popularity to make better decisions about which data to use. Share and comment on workspaces and data sources.
  • Collaboration: Understand the relevance of data in relation to how it’s utilized by different user roles in the organization
  • Machine Learning: Benefit from machine learning capabilities, which identify patterns of use and success, perform data quality scoring, suggest relevant sources, and automatically recommend data preparation actions based on user persona.
  • Data Quality and Governance: Provide sanctioned, curated data sets to promote reuse and consistency. Comprehensive governance features, including data masking, data retention, data lineage, and role-based permissions.
  • Automated Operations: Define, schedule and execute recurring processes and data prep workflows.