It is a lovely Saturday afternoon, birds chirping, light breeze flowing into my home office window, pups napping, a good change is now imminent and the kick-off of a new football season is entertaining other family members. What a perfect time to continue the Exploring Oceans of Data series. In the introduction, a New World of Data was discussed. In this article, I will dig into cloud data warehousing and business intelligence with Azure SQL Data Warehouse.
Azure SQL Data Warehouse
Azure SQL Data Warehouse is an enterprise-class distributed database capable of processing petabyte volumes of relational and non-relational data. It is the first elastic-scale cloud data warehouse that offers full indexing including clustered columnstore index, stored procedures, functions, partitions, common table expressions and auditing. It also can grow, shrink, and pause within seconds versus long, expensive waits.
In my opinion, what is even more amazing about this technology is the ability to get insights across all data formats, structured and unstructured, with seamless T-SQL integration and Polybase™. Data stored in a regular Azure Storage Account (either Page Blob or Block Blob) can also be queried or imported into Azure SQL Data Warehouse. Supported file formats include delimited text, Hive RCFile or Hive ORC file formats. The new Azure SQL Data Warehouse service was announced at //build in April 2015 and is now in public preview. The preview is targeting data warehouses in the 500GB -10 TB range.
My favorite videos thus far that introduce and dive deeper into Azure SQL Data Warehouse are available on Channel 9:
You can find more videos on Microsoft Cloud and Enterprise YouTube Channel.
Getting Started with the Preview
Recently I received my email invitation to preview Azure SQL Data Warehouse. After you get the email, you will run through a few steps to configure it, add a user, add firewall permission settings and load data into it.
The team developing and supporting this new technology along with MVPs in the wider SQL Server community have done a superb job providing articles to get started. I won’t reiterate what they have already done. Instead, I will provide links to a few of their articles that I found invaluable to get up and running.
- Getting Started with Azure SQL Data Warehouse
- Connecting, Setting Up Users and Querying
- Loading Data in Azure SQL Data Warehouse
- Using Power BI with Azure SQL Data Warehouse
Note: Be sure to update your Visual Studio Data Tools to include the “SSDT Preview version 12.0.50623 or later“. I initially did not do that, started playing before reading all the docs and got blocked on trying to connect. You will need to have that Visual Studio update to connect and set up a user first. Then you can go wild loading data, querying and connecting your super cool cloud data warehouse to Power BI or many other BI tools that are also early adopters.
Designing for Azure SQL Data Warehouse
One of the beautiful aspects of being able to use familiar T-SQL, stored procedures and other supported SQL Server syntax is not having to change much of your application code. Azure SQL Data Warehouse supports many of the same query and stored procedure features found in SQL Server. Notably there are a few differences to be aware of that are designed specifically for higher scale and improved performance when using a cloud distributed data warehouse. James Rowland-Jones has covered these topics in his must-read series of articles at “Develop solutions for Azure SQL Data Warehouse” that details distributed data warehouse development principles, best practices and coding techniques.
For my demo playground, I used the sample provided by the product team that uses a classic SQL Server bulk data loader bcp utility approach. Initially, I did run into issues with the bcp timing out from my home office development laptop in Tampa, Florida. Once I added, -l 120 to the provided batch script the data loaded within minutes and I was good to go.
In addition to bcp, you can also load data into Azure SQL Data Warehouse with Azure Data Factory, PolyBase, SQL Server Integration Services (SSIS) and other 3rd party tools. These other options are covered in the Load section of the Getting Started documentation.
Now that you have data loaded, you can begin querying that data with familiar T-SQL. SQL Data Warehouse uses an advanced query optimizer that figures out how to optimize distributed queries based on assessing the cost of query operations. It also has advanced algorithms and techniques that efficiently move data among the computing resources as necessary to perform the query. The columnstore index technology enables incredible query performance with up to 5x compression gains over traditional row-oriented storage, and up to 10x query time improvement. Columnstore indexes are ideal for dimensionally designed data warehouses since reporting queries often scan through extremely large fact tables or an entire partition of a fact table. For example, a query that gets and summarizes sales over all available time periods.
Visually Exploring Data
Last but not least, you can easily connect Power BI or many other self-service BI tools to visually explore and create reports on Azure SQL Data Warehouse. Connecting in Power BI is incredibly easy and available in both the Azure SQL Data Warehouse user interface as well as in Power BI under Get Data in Databases & More section.
With the direct connect capability, you can begin visually exploring your cloud data warehouse by dragging and dropping fields onto the Power BI report canvas. What is truly nice about direct connect is that you get to see latest and greatest available data without having to schedule data refresh and you don’t use Power BI storage space. Note that in Power BI Dashboards, the tiles get automatically updated every 10 or 15 minutes. If you want to see the most up to date information, you can do that with Power BI reports. Power BI reports issue live SQL queries to the Azure SQL Data Warehouse as you browse them.
Here is a quick peek at visually exploring the sample data provided by the Azure SQL Data Warehouse team in a Power BI report. Currently the underlying data model used by the Power BI web service does not automatically detect relationships – it is a work in progress. If you want to define Azure SQL Data Warehouse relationships, you can create views on the Azure SQL Data Warehouse side or use the Power BI Desktop Azure SQL Data Warehouse option to optimize the visual web exploration experience.
What I found in my early preview experience is that it is really quite simple to get started with Azure SQL Data Warehouse. The instant Azure cloud SaaS apps provided by Microsoft open up a lot of amazing potential to rapidly and cost effectively deliver insights anytime, anywhere on large scale, hybrid data in a new world of data.