After two glorious days of vacation in Palm Cove, Australia, my inner spirit is recharged and I am ready to write. Last week I presented Tackling Data Gravity with Hybrid BI at the Asia/Pacific Gartner BI Summit series. I chose that topic since it is a fascinating area of analytics that I am seeing deployed more often now with early adopters and BI market leaders. Although I showcased Microsoft’s answer to defying data gravity with a variety of hybrid BI capabilities, I predict the entire BI industry will be talking about this topic in the future.
Data gravity introduces significant industry challenges. BI has primarily lived on-premises only with a tiny 2% of BI applications living in the cloud. Even as the industry shifts more and more apps to the cloud at much faster pace with more BI also heading to the cloud, data warehouses and many other data sources may still live on-premises for a long time. Thus there will be an increased need for BI apps to query across both realms, on-premises and cloud.
What is Data Gravity?
Data gravity is an undeniable market force that I am indeed seeing in our BI industry mid-life crisis. The mobile and cloud first world of a bazillion apps for everything generates more data in the cloud than on-premises. As more apps are being delivered via mobile, cloud and Software as a Service (SaaS), the center of data gravity is already shifting.
Last year at the Gartner BI Summit, data gravity came up in several sessions. After doing a bit of research on the concept, I found Dave McCrory’s blogs to be the most enlightening and educational. He has even guestimated a few cool formulas calculating data gravity, data physics models, application mass and other related areas.
“Consider Data as if it were a Planet or other object with sufficient mass. As Data accumulates (builds mass) there is a greater likelihood that additional Services and Applications will be attracted to this data. This is the same effect Gravity has on objects around a planet. As the mass or density increases, so does the strength of gravitational pull. As things get closer to the mass, they accelerate toward the mass at an increasingly faster velocity.
How does one defy data gravity? You can only shuffle data around so quickly. Even with the fastest networks, you are still bound by distance, bandwidth, and latency. All of these are bound by time, which brings us to speed of light. You can only transfer so much data across the distance of your network. In a perfect world, the speed of light becomes the limitation. At some point, it becomes impossible to move an app, service, or workload outside of the boundaries of its present location.” – Dave McCrory
Data Gravity Impacts on BI
BI and analytics pros cannot ignore the value of data in the cloud. The gorgeous dashboards they create typically are not at the dead end of a one way street. Most analytics projects are highly iterative in nature. Dashboards enlighten a decision maker that in turn sends action items for adjustment. The business process being monitored by a dashboard is continuously tuned for optimal performance at various points from various data sources. To effectively deliver iterative intelligence, data and added context flows rapidly back and forth between apps, data sources and the analytical assets regardless of where they live…on-premises or in the cloud.
In the past I extracted and downloaded Google Analytics and Salesforce cloud data into a client’s on-premises data warehouse. Today there are many more line of business apps in the cloud such as Marketo, Dynamics and Workday. The decisions to copy or move cloud data for analytical purposes are getting a bit more challenging as cloud data volumes grow. In a variety of use cases today, it makes no sense to copy or move data from the cloud to on-premises database servers. Especially if your data center managers are in the process of migrating workloads to the cloud. Wherever your data lives, you should be able to get value from it. That is where hybrid BI capabilities are become essential for BI pros to understand and utilize where it makes sense to do so.
Hybrid BI Data Source Patterns
Salesforce has been in the cloud game for a long time. For years BI pros struggled with common requests to download all of a company’s Salesforce history into a local data warehouse for analysis. It was a nightmare trying to pull massive amounts of data with the default Salesforce bulk APIs. You’d initially think it would work but inevitably you would run into timeout issues. The best solution that I found to that challenge was DBAmp. DBAmp created a SQL Server linked server connection to the Salesforce bulk API with much richer capabilities. Using DBAmp, Salesforce tables looked like local SQL Server tables making them easier to query with BI apps.
As cloud and hybrid BI technologies continue to evolve, I am seeing that linked server pattern per se in a few places from data virtualization to remote distributed queries. In Microsoft’s offerings, we have SQL Server stretch database tables, PolyBase and Elastic Queries. All use external table pointers to remote data sources. All enable users to easily query hybrid cloud and on-premises data from their favorite BI tool just as if the data were stored in a local table.
Stretch database tables were introduced with SQL Server 2016. This feature allows older data to be stored in the cloud and more recent data to remain on-premises. The stretched cloud portion of data remains accessible for querying – essentially providing online “cold” data. That is a fabulous feature for analytics pros that need to run algorithms over large periods of time.
PolyBase enables the use of simple T-SQL queries against both relational data and remote ‘semi-structured’ big data in Hadoop HDFS or Azure. Although data may be stored remotely, the external table looks like a local table to users. Getting insight from different types of unstructured data used to be much more challenging to accomplish using web service JSON jump hooping or other parsing routines. Now with Polybase, it is a whole lot easier.
Elastic queries are another potent technology that allows users to query multiple databases with T-SQL. It allows users to perform cross-database queries that can scale out queries to large data tiers residing in the cloud to BI apps on-premises.
Azure Data Factory is also another hybrid BI Microsoft technology bundled in the Cortana Analytics Suite. It orchestrates larger data movement activities between on-premises and cloud or between cloud-to-cloud data sources. Azure Data Factory can connect to a wide variety of data sources including but not limited to Azure Blob, Azure Table, Azure SQL Database, Azure SQL Data Warehouse, Azure DocumentDB, Azure Data Lake Store, SQL Server, file systems, Oracle, MySQL, DB2 Database, Teradata, Sybase, PostgreSQL, ODBC data sources and Hadoop Distributed File System (HDFS) OData and HTML tables on-premises or in cloud Azure IaaS. For more information on Azure Data Factory, check out the online docs.
Another pattern that is fairly unique to Microsoft is the Power BI Data Management Gateway. Data Management Gateway enables centralized registration of on-premises data sources that can be remotely live queried and refreshed in cloud Power BI reports. Most cloud BI solutions today require copying data to cloud…not Power BI. Power BI provides an option to copy data OR live query data on-premises from the cloud.
Secure queried data transfer occurs between Power BI’s cloud service and the on-premises Data Management Gateway in an Azure Service Bus without opening a firewall port. Cloud reports issue live queries to the remote on-premises data sources in the native data source language (SQL, MDX, DAX, etc.). The local data source on-premises then checks user role-based security and returns a data set to the cloud app. For on-premises data warehouses, direct query with Power BI Data Management Gateway for self-service BI in the cloud adds a lot of value and effectively tackles classic hybrid BI data challenges eliminating the need for large data movement.
Another differentiated design pattern with Microsoft BI in the hybrid BI world is reporting across the realms. Not only unifying on-premises Power BI, Excel and Reporting Services reports into one user experience but also bridging on-premises reporting with cloud reporting.
With Microsoft PowerBI.com “pinning” capabilities, users can combine on-premises Power BI, Excel or even Reporting Services reports with cloud Power BI reports. Combined “pinned reports” are also available in one set of Power BI native mobile apps across iOS, Android and Windows mobile devices – phones and tablets. Regardless of where and what Microsoft BI tool creates a report, PowerBI.com can render it.
I can honestly say that I believe Microsoft is the only vendor in the BI market that can achieve this level of unified hybrid BI reporting today. We all know that as much as competing BI vendors bash Excel, business users still love and need Excel. Ideally, all types of reports would be made available for your business users regardless of report type or where those reports live in a modern hybrid BI world.
The last highlighted area of hybrid BI that I mentioned in my session was iterative, actionable analytics with Azure ML, R, Office and new Power Apps integration with Power BI.
Azure ML and R enable embedding of predictive analytics capabilities in on-premises or cloud applications. A catalog of predefined analytics solution templates enables rapid deployment of intelligent analytics regardless of deployment location choice simply by connecting a data source.
Office is still the world’s most popular productivity tool with millions of users already using Office 365 in the cloud. There are many synergies between Office apps and Power BI including awesome data storytelling with Sway and group collaboration. Just click the icon at the top left of the Power BI screen and explore what you can combine today with Office 365.
New Power Apps allows business users to design, build and deploy business process apps from templates that integrate, query and provide write-back to cloud or on-premises data sources. Using no code, non-technical users can create forms, apply business logic and even add workflows to make Power BI an actionable and iterative decision making solution. From end-to-end business process to insights to reporting, Power Apps + Power BI are “pun intended” = a powerful combination.
For More Information
If data gravity and hybrid BI is interesting to you, check out my presentation and notes posted on SlideShare. For Gartner BI Summit Australia attendees, there may also be a recording available of the live session and hybrid BI solution demonstrations.