Obtaining a 360-degree customer view means having a holistic customer profile record that captures different types of data from across channels and systems, aggregates that data to understand what’s important to customers, and applies those insights to deliver personalized, engaging customer experiences. The road to obtaining a customer 360 view usually involves the development of a data lake that unifies data from all customer touchpoints – streaming, structured data files, unstructured documents, emails, webchats, web logs, transaction logs, and social posts.

Why Design a Data Lake

Data lakes are more agile and flexible than traditional, relational data management systems. When customer profiles and customer segmentation include an integrated mix of demographic, transaction, environment, behavior and social data attributes, organizations achieve several powerful capabilities.

  • Improved engagement and relationships with their most valuable customers
  • Systemic migration of low or marginally profitable customers to become highly-profitable customers, a.k.a. increasing share of wallet
  • Ability to pinpoint unsuccessful product promotions, levels of service or other factors that result in less profitable customers
  • Metrics for calculating the costs to acquire, serve and retain customers for each targeted demographic, etc.
  • Enhanced personalization and targeted marketing
  • Better understanding of customer expectations and calculated propensity to purchase
  • Learning where to reduce costs by understanding low value channels and services

In this article and upcoming online event, we’ll further explore how to break down customer data silos using a data lake and an agile, modern integration platform.

To learn more about this topic, please join me and Isabelle Nuage, Director of Big Data solutions at Talend in a webinar on November 15th to discuss how to architect a Customer 360 data lake to break down customer data silos by using an agile, modern integration platform. We will explore how Talend’s comprehensive data integration platform helps establish processes for data cleansing, governance, security and privacy.

Customer 360 – Top Use Case for Data Lakes

Developing a 360-degree view of the customer is not an easy feat. Customer data seems to be everywhere – streaming data sources, structured data files, unstructured documents, emails, webchats, web logs, transaction logs, and social posts inside and outside an organization. The average number of applications that store customer data keeps on growing. According to the May 2017 Chief Martec Landscape, there are now over 5,381 different solutions that generate customer data. Per Gartner, a typical $1B company will likely be running more than 50 customer experience related projects at the same time.

BARC Data Lake Research

Source: Talend sponsored BARC Data Lake Report

Per 2016 BARC Research, customer intelligence is the top use case data lakes today. Data lakes are flexible, highly scalable and enable organizations to quickly integrate and process massive volumes of customer data from heterogeneous sources and formats (structured and unstructured). Notably, over 95 percent of BARC’s 380 surveyed data lake early adopters cited being able to recognize business benefits from a data lake. (Note: For more detail on demographics, please download the source survey.)

Notably, over 95% of surveyed early adopters cited being able to recognize business benefits from a data lake.

Additional key findings from the BARC study include:

  • Over 40% shared improved competitive capabilities
  • Over 30% noted faster response to change
  • Over 46% listed improved retention by better predicting customer behavior

Unlike many other types of data platform and integration projects, data lake benefits can be seen and understood by the business.

Data Lake Design Considerations

Data lakes evolved from the realm of big data. When the volume, velocity and variety of data from IoT sensors, click stream, streaming social media channels, and other unstructured sources began to overwhelm traditional relational enterprise data warehouse processes, more agile approaches using Hadoop, NoSQL, hybrid or cloud computing technologies were adopted. While “ingest and store” designs solved the data acquisition problem, it also created “data swamps”, which were basically big pools of ungoverned, mismanaged, and poor quality data. To solve for both needs – gathering and using information – data lake designs needed to mature.

When architected and implemented properly using a modern integration platform, a data lake can solve your Customer 360 pains and adapt to future business needs. A modern integration platform is vital for properly loading both streaming data and batch data into a data lake, making raw unstructured data in a data lake more “business ready” by improving its quality, reducing liability of customer data and adapting for emerging technologies.

Data Lake Concept Architecture

Data Lake – Conceptual Architecture

When architected and implemented properly using a modern integration platform, a data lake can solve your Customer 360 pains and adapt to future business needs. A modern integration platform is vital for properly loading both streaming data and batch data into a data lake, making raw unstructured data in a data lake more “business ready” by improving its quality, reducing liability of customer data and adapting for emerging technologies. 

Properly Loading Data into a Data Lake

A major challenge in today’s world of big data is getting data into the data lake in a simple, scalable, automated manner. Real-world complexity of loading data from source to store is often underestimated. Hand-coding on Hadoop, NoSQL or Cloud platform q is brittle, inefficient and cumbersome to change, manage and maintain.

To ease data ingestion into a data lake, you need to use the right tools for the job that can handle streaming, batch and real-time data sources without having to rewrite code. Automated ingestion tools with enterprise connectivity for your ERP, data warehouse and CRM systems can greatly improve your team productivity. In combination with streaming data source ingestion is often tasked to handle huge bursts of data, allowing you to perform real-time analysis on those data streams, and elegantly orchestrate data movement.

Talend Streaming Ingestion Framework

Making Unstructured Data in a Data Lake “Business Ready”

In traditional enterprise data warehousing, data extract, transform and load into defined schemas was a time-consuming process but provided data in a format the business could use. To get data quickly into a data lake, schemas are not pre-defined and data may not be cleansed. Data is often stored in raw native unstructured formats such as delimited, JSON, log and other file formats, making it .  difficult for business users to consume or analyze the data. Relying purely on ad-hoc schema-on-read, unstructured data formats will limit data lake usage to small groups of data professionals and data scientists that have access to specialized tools.

To make your data lake “business ready” and extract the most value from collected customer data, businesses need to consider how to best deliver information to different stakeholders. For example, business ready data in the data lake should be cleansed, flattened, blended and mapped, governed with lineage history and role-based security.

Gains in data lake ingestion speed and higher data load volumes are achieved by reducing initial data quality processes. Customer data collected from social media and other third party sources is known for data quality, completeness and consistency issues. Unlike structured, cleansed data stored in enterprise data warehouses, raw data stored in data lakes includes dirty data that is not dependable for analysis without cleansing, validation and preparation.

Last year an Experian Data Quality Survey revealed that 33 percent of organizations believed their data was inaccurate or incomplete, undermining their ability to automate decisions. This year a recent Harvard Business Review survey only 3 percent of companies’ data met basic quality standards. Data quality and protection efforts should be ongoing and pervasive across all the data lake orchestration pipelines. To address data quality challenges in a data lake you should put processes in place to quarantine dirty data and leverage data integration machine learning processes to automate and improve cleansing processes.

Data quality might not seem exciting to the business but it can impact the bottom line. Travis Perkins in the United Kingdom shared a 30 percent boost in conversion rates after cleaning their online product catalog data. Air France KLM uses Talend for data quality and metadata management for personalizing traveler experiences. This group created a complete 360-degree customer view, by integrating data from trip searches, bookings, and flight operations, along with external web, social media, call center, and airport lounge interactions. One of the fantastic benefits for travelers was improved ability of Air France KLM to find and return 80% lost items to their customers. To get support for data quality work, connect data quality to metrics your business does measure.

Reducing Liability of Customer Data

A data lake without proper governance, can become a liability with multiple versions of data and a lack of traceability to know what happened to the data making it more difficult to meet compliance regulations such as European General Data Protection Regulation (GDPR).

In less than one year GDPR will be enforced. GDPR is the most significant change in customer data privacy regulation in 20 years. It will impact Customer 360 data lake projects, artificial intelligence, reporting, self-service BI, data warehousing, master data management, and even personalization in line of business applications. This regulation requires a legal basis to justify collection and processing of personal data. You’ll have to document customer data management and reporting practices. You need to know what customer data you have, where it resides, where it is sent, delete it if requested and provide proof for compliance inquiries. You’ll also need to protect and mask sensitive data attributes.

Only when you have proper data governance in place can you open up the accessibility of your customer data lake to the right person at the right time, enabling more self-service for your business users.

Adapting for Emerging Technologies

The only constant with customer data is continuous change. Technical teams designing and building data lakes need to be able to swiftly change, and embrace emerging technologies. The precarious nature of digital customer data coming in from thousands of sources is unpredictable. New digital channels, apps and customer data sources pop up all the time. Existing channels update frequently and can disappear suddenly. Collecting customer data from external digital touchpoints can be overwhelming.

The only constant with customer data is continuous change.

When planning your data lake, think through what should happen when a data source field gets added, moved or deleted. Decide how you will deal with unplanned data source outages or pipeline issues. How will you swap data source or upgrade message queues without a data or service blackout?

To deal with constantly evolving customer data sources, make sure your data lake is designed using patterns that can easily adjust to change. Automated data ingestion solutions with rich libraries of plug-and-play data connectors and transformation components are a necessity. Be sure you can handle intermittent live streams and sudden bursts of incoming customer data from sources such as Kafka, Amazon Kinesis or Google Pub/Sub, import from cloud apps, and bulk load from web service APIs.

Next Steps

To unlock the power of Customer 360, reduce risks and make your data lake “business ready”, proper loading management and data governance is vital. In our webinar, we’ll dive into these topics and provide actionable guidance to help you plan and develop a successful Customer 360 data lake. Talend’s modern and comprehensive platform for Big Data and Cloud data integration with data quality expands to MDM, enterprise service bus (ESB), metadata management, and data preparation capabilities, helping you successfully deliver a future-proof data lake to avoid the pains of starting over.

For addition information on designing and implementing a data lake, please explore the following excellent resources.