Incorta is one of the most intriguing startups in Silicon Valley. While in stealth mode, they landed and expanded in a Fortune 10 company, a reputable university and currently powers analytics for some of the world’s largest companies. Last month Incorta raised $10 million in a funding round led by GV (former Google Ventures) citing a novel technology that “rethinks how data is stored and accessed”. The company claims their Direct Data Mapping™ technology can reduce or remove the need to create data warehouses.
To learn about Incorta, please join my peer Matt Morris in a webinar on April 26, 2017 at 2PM EST.
Fact or Fiction
Having been in this space for over twenty years, I have heard my fair share of data warehouse death predictions. Data warehouse demise buzz peaked from 2012 to 2015 when big data was center stage, data lakes were all the rage and self-service data discovery tools were gaining significant momentum. During that time, I wrote a popular article called “Did you Buy a Self-Service BI Fantasy?” to provide my readers a technical sanity check.
Despite advances with Spark, GPU, and in-memory computing, until now I have not come across anything that has been performant enough to eliminate data warehouse design patterns. Even mega-vendor “latest and greatest” in-memory self-service BI solutions still use OLAP approaches from the 1990s underneath pretty HTML5 front-ends. OLAP is not exactly innovation in a world where automation, machine learning and artificial intelligence will be disrupting analytics again soon.
Although Incorta has an Excel Add-In, responsive HTML5 dashboards with natural language search and a lovely array of visualization types, the true beauty of this solution lies within the patent pending engine. This engine can optionally be used with third-party visualization solutions. After completing a “hands-on” review of Incorta, I am most fascinated with sub-second query response times on complex joins with real-time aggregation for data sources with hundreds of millions of records.
Incorta is using the right modern technologies – Apache Spark, Parquet and open source big data analytics libraries. It can be deployed in the cloud or on-premises for better control of costs. For cloud clients, they do provide templates for Amazon Web Services (AWS) and Google Cloud platforms.
Incorta has an elastic architecture that enables plug-and-play data sources and automated workload changes with connections to flat files, Excel, CSV/JSON, Hive, JDBC data sources, Oracle, Microsoft, MySQL, IBM DB2, IBM Netezza, HP Vertica, SAP HANA, Teradata, Amazon AWS Redshift, Google BigQuery, Kafka streaming and even cloud applications web services such as Salesforce.com, NetSuite, Oracle Cloud, Zuora and ServiceNow.
Direct Data Mapping™
Incorta’s Direct Data Mapping™ is thought-provoking. It delivers near real-time analytics on top of original, intricate, transactional data such as ERP systems. Direct Data Mapping™ excels in executing real-time joins with aggregations. It is designed to eradicate cumbersome, time-consuming ETL routines, dimensional data stores and traditional OLAP semantic layers.
Direct Data Mapping™ excels in executing real-time joins with aggregations.
Honestly Incorta’s Direct Data Mapping™ pitch reminds me of when I first heard about Qlik back in 2008 or experienced Spotfire and Tableau’s simple drag-and-drop development approach around that same time. Notably several modern data discovery tools sold similar traditional OLAP pain elimination solutions, proved to be successful, and they did disrupt the market.
I initially doubted data discovery tools could be used for enterprise BI. However, there were enterprise capabilities bundled into wonderful, rapidly deployable, scalable solutions that were hidden by strange marketing labels. After testing and experiencing agile data discovery technology “hands-on”, I fell in love with it. I never wanted to build, maintain or be awakened again by cell phone alerts of failed ETL jobs or Analysis Services cube partition processes… nor did I want to code complex multidimensional MDX or DAX just to get a basic calculation. I know there is a lot of legacy work out there but I won’t invest my time on it. I refer ETL and Analysis Services projects to other consultants.
One key difference between Incorta and current data discovery tools is that the data does not need to be shaped for optimal performance. Usually with data discovery tools, you do need to query, shape and move data into an in-memory engine. With Incorta Direct Data Mapping™ performance is exceptional with many tables. You don’t need to flatten tables or even limit the number of tables being joined in Incorta.
Incorta Direct Data Mapping™ performance is exceptional without shaping data.
That is why Incorta is challenging traditional data warehousing patterns. Direct Data Mapping™ enables users to make decisions with up-to-date information stored in multiple databases across an organization without requiring a data warehouse.
Much like my skepticism with data discovery, Direct Data Mapping™ conceptually contradicts my experience with reporting database development and joins. Let’s dig into how Direct Data Mapping™ works.
Incorta Direct Data Mapping™ technology enables consumption of disparate transactional application data sources that contain a high number of joins directly. The engine is capable of low impact, extremely fast data ingestion and integration from disparate source systems.
With Direct Data Mapping™, you can unify hundreds of complex tables while reporting directly on real-time data.
With Direct Data Mapping™, you can easily unify hundreds of complex tables quickly while reporting directly on real-time data. It leverages 1:1 data mapping to preserve data’s original shape and security parameters. Data can be persisted as Parquet files in HDFS, allowing it to be leveraged by other popular big data tools such as Presto. These capabilities eliminate time consuming ETL data movement workflows into dimensional data stores and OLAP analytical data layers.
Old versus New Analytics Approaches
In the past, traditional analytical dimensional data warehouse design was a slow process. New data warehouses could cost millions of dollars and take years to roll out. It was not uncommon for new data sources, fields or even minor reporting changes to take months of work to deliver. By the time work was complete, the business requirements had already changed.
Analytical questions needed to be asked in advance along with desired drill paths called hierarchies. As a data architect, I would design a data model and then test it with my users. We would pivot a subset of data using Excel or an analytical desktop tool. Remember ProClarity Desktop? I reveal my age. Modern data discovery solutions sure have improved rapid prototyping today!
After my data model design was approved, I’d develop the dimensional data stores and ETL processes. Those are not easy to build and usually accounted for up to 80% of the total project effort. ETL is brittle and miserable to manage. I would struggle through ETL to get to the truly valuable part of analytics – building analytical reports, dashboards and predictive models. In case you did not realize it, most business stakeholders don’t appreciate ETL. They just want the end results = analytical insights.
Connect, Map and Get Insights
New approaches to analytics bring almost immediate value to the business. Incorta’s Direct Data Mapping™ provides the following benefits.
- Delivers fast performance on complex data sources without shaping data
- Eliminates the need for a dimensional data warehouse and its associated star-schema iterations and relational-based queries
- Allows IT to easily aggregate and secure complex data residing on-premises or on popular cloud platforms
- Enables all users to view and report on joined data in apps across the enterprise in real-time, on their own, without delay
Modern analytics platforms like Incorta also fulfill four core tenets.
- No data left behind
- Speed and simplicity
- Secure, governed self-service
- User led versus IT led
If you are still using old analytics approaches, I encourage you to explore newer, better ways to expedite delivery of information and improve operational intelligence. In a digital world, speed to insight will become a critical success factor as more business processes and customer interactions are automated.
Will Incorta Kill the Data Warehouse?
In my professional opinion after reviewing Incorta’s solution “hands-on”, Direct Data Mapping™ can reduce the need to build a data warehouse. Will it kill off the data warehouse? I don’t know. I do know that Incorta does remove the need to move data to dimensional data stores or use legacy OLAP cube approaches to build reports from multiple data sources that span across an enterprise.
Incorta Direct Data Mapping™ queries that I tested across data sources were incredibly fast. I am guessing that is one of the reasons why GV (former Google Ventures) made a big bet on this startup. Incorta is impressive and innovative.
For More Information
If you’d like to delve deeper into Incorta and their proprietary Direct Data Mapping™ technology, check out the following resources.
- Incorta website
- Rethinking the Data Warehouse Webinar
- Rethinking the Data Warehouse White Paper
- Gartner Technology Insight for Modern BI and Analytics Platforms