Process failed. That’s what happens when your data outgrows your OLAP technology. To solve that problem, OLAP on Hadoop was born. In a recent article, I shared classic OLAP dimensional design tips from my archives. This time I’ll share five “need to know” tips to start taking your OLAP skills into a big data world.
When designing OLAP systems, you can use MOLAP (multi-dimensional OLAP), ROLAP (relational OLAP, direct query) or HOLAP (combination of MOLAP + ROLAP). Newer in-memory solutions such as Analysis Services Tabular mode and other columnar, in-memory databases are also available. Each approach has benefits and limitations. When working with big data, limitations become show stoppers. Data stored natively in Hadoop is not feasible for OLAP technology or for non-technical business users to analyze in BI tools.
The following OLAP on Hadoop design best practices have been collected from AtScale, a market-leading OLAP on Hadoop vendor. AtScale was born to deliver on the promise of making big data useful for the business. Aside from the vendor making these claims, early adopters such as Sumit Pal, author of SQL on Big Data: Technology, Architecture, and Innovation, are also sharing success stories.
“Products like AtScale reincarnate OLAP for Big Data.” – Sumit Pal
Fun fact! One of AtScale’s founders managed the largest known Analysis Services OLAP cube implementation in the world while he was at Yahoo! Behind the scenes, Yahoo!’s team had to work-around numerous issues and limitations in Analysis Services. Analysis Services was not designed for the nature, volume, or variety of big data sources Yahoo! needed for reporting.
Top Design Lessons Learned
Going through the Yahoo! experience and then designing a solution specifically for OLAP on Hadoop to solve pains and overcome limitations while retaining the business-friendly, usability aspects of OLAP, the AtScale team learned what works best and what to avoid. Here are their top tips for working with OLAP and big data.
1. Avoid moving or copying data
Whenever possible, query data from one place. In classic OLAP, cumbersome ETL routines copy, transform and move data to relational dimensional data warehouses. Although you could use ROLAP (direct query) storage techniques that don’t require moving data, only a handful of relational data source types are supported today. Those come with a long list of warnings. If you do get past ROLAP development hurdles, production performance will be your next massive challenge.
Moving and copying large volumes of data in a big data world is simply not efficient or even practical. When designing OLAP on Hadoop, you don’t need to move or copy data. You can create a universal semantic layer on Hadoop using schema-on-read queries.
2. Create a single, unified semantic layer
Semantic layers are essential for making big data useful to the business. In the big data world, you likely have a Hadoop data lake, a relational dimensional enterprise data warehouse, and multiple departmental or application level data marts. Each one might leverage different technologies, may or may not reference organizational master data, and may introduce alternate versions of reality creating data messes. Data messes are expensive to manage. They can evolve into bigger problems with big data. Key design lesson learned…do not store multiple versions of reality.
Evolve Hadoop to become your centralized data lake. Ingest raw unstructured or structured data quickly into Hadoop and use query on read level transforms to develop a single unified semantic layer for the business to use with Excel, Tableau, TIBCO Spotfire and other popular BI tools.
With AtScale, business users can connect and explore the data stored in Hadoop using standard OLAP XMLA or Analysis Services connections in BI tools. AtScale’s semantic layer is simple to navigate, looks just like an OLAP cube: accelerates big data queries, and is flexible to managing change.
For OLAP professionals that already know how to design Analysis Services cubes, the AtScale design experience may look familiar to you. There is minimal learning curve. When I first saw this solution in 2015, I teased the AtScale founders. I asked them if I was secretly looking at the next version of Analysis Services.
Apparently, my instincts on AtScale were right. It is not difficult for Analysis Services professionals to learn. Andy Tegethoff, Senior Principal – Technical Consultant at Clarity Insights, shared the following comments.
“I’ve found AtScale to be intuitive to learn coming from Analysis Services. Performance is another plus…it does not require processing or related downtime windows.” – Andy Tegethoff
A nice Hadoop aware feature is AtScale’s automatic parsing of key value pairs into attributes. Here is a sneak peek of it.
3. Do NOT scale up, scale out
Embrace the open Hadoop ecosystem that is stellar for scale out use cases. Scale up is NOT the big data way to expand.
4. Learn to LOVE schema-on-read
Unstructured data types in the Hadoop world do not fit neatly into rows and columns. To shift your skills to the big data world, you will learn to love and leverage Hadoop’s powerful schema-on-demand capabilities. This is what it looks like in AtScale.
In the image above, notice raw key value pairs get mapped to a table “on demand” using SQL external table syntax. From there, you can simply query data with SQL functions. We see this design pattern frequently now when working with hybrid cloud and big data solutions. Schema on read is much more elegant when working with big data than attempting to apply older ETL approaches.
5. Use open engines to enjoy diverse analytics
Do NOT lock yourself into proprietary stacks. Embrace open source. Be open in terms of Hadoop engines used to store data and the methods being used to query data. Enjoy Spark, Impala, and future open engines. The beauty of being open is that your users can enjoy a diverse range of BI and analytics tools aka “bring your own reporting tool”. Data savvy folks must not be held back to a single vendor! Analytics is strategic.
For More Information
In this article, I briefly touched on OLAP on Hadoop design differences and best practices. For in-depth OLAP on Hadoop educational content, check out the following recommended resources.
- AtScale http://www.atscale.com
- AtScale Resources http://www.atscale.com/resources/
- AtScale OLAP for Hadoop White Paper
- AtScale Single Semantic Layer White Paper