Old time Microsoft BI fans, do you remember Project Barcelona? Fast-forward a few years, wrinkles, pounds, grey strands in my hair and now we have an even better version of it available in public preview today called Microsoft Azure Data Catalog. General Availability is targeted for later this fall.
Azure Data Catalog, aka Project Tokyo, is an enterprise-wide data source metadata catalog that enables rich self-service data source discovery. It provides capabilities that enable any user – from analysts to data scientists to developers – to register, discover, understand, and consume data sources. Searching for the right data or specific version of a business defined metric for use within popular, self-service business intelligence tools such can be eased by adding Azure Data Catalog into the mix.
From Barcelona to Tokyo
In the SQL Server 2012 launch days, Project Barcelona was initially explored for solving the enterprise metadata management, lineage, impact and data flow analysis pain points in organizations. In the early concepts, query views were stored in a Dependency Catalog database to empower DBAs and BI Professionals the ability to search data sources and check a dependency window in SQL Server Management Studio or a web browser. Slide 23 in my June 2012 Microsoft Business Intelligence Overview deck on SlideShare includes a few more tid bits on the original project.
Following Project Barcelona, a popular Data Catalog and query sharing capability was designed and shipped as part of Power BI for Office 365. The latest Azure Data Catalog is an evolution of the existing Data Catalog. Shortly after the Azure Data Catalog becomes generally available the two catalogs will merge into a single service.
The public preview version of Azure Data Catalog is currently available side-by-side with the existing Data Catalog. Once Azure Data Catalog progresses and matures, the two services will be merged into a single offering. When that happens, your data in the existing version with be migrated to the new Azure Data Catalog. The intent is that any customer who would like to take advantage of the additional capabilities delivered with preview version can do so immediately post migration.
Expanded Audience and Data Discovery Tools Usage
One of the most notable differences in Azure Data Catalog is the target audience for data source search, review and tagging.
- Data Developers, BI and Analytics Professionals: Individuals responsible for producing data and analytics content for others to consume.
- Data Stewards: The domain and data subject matter experts with knowledge of what the data means and how it is intended to be used.
- Data Consumers: Anyone that wants to discover, understand and connect to data needed to do their job using the tool of their choice.
- IT Professionals: Individuals responsible for empowering data driven cultures in a responsible manner. These folks provide secure access to hundreds or thousands of data sources for a business. This group also needs to maintain oversight over how data is being used and by whom.
Unlike traditional metadata management solutions that are typically IT driven and managed, the Azure Data Catalog focuses on crowdsourced annotations that will help empower business experts with the detailed domain knowledge of reporting data to enrich the catalog. The goal is to reduce the amount of time self-service data consumers spend looking for the appropriate data to use. IT can maintain control and oversight of system as it evolves. Since it is a fully managed cloud SaaS service, there is no hardware required, minimal effort and investment needed to evaluate it.
In contrast to the prior version of Power BI for Office 365 Data Catalog, the preview of Azure Data Catalog can literally be used with any data discovery tool – Power BI, Tableau, Spotfire, Qlik, SAP Lumira, etc. The engineering team concentrated on the pain point of searching for data. This is an activity that is performed in many different reporting tools throughout an organization. To successfully solve that problem, Azure Data Catalog’s design needed to be tool agnostic and support the bring your own reporting tool (BYORT) world that we live in today.
You can think of Azure Data Catalog as a directory of data about your data. It does not copy or move your data. It does support registering data from virtually any source, structured and unstructured, on premises and in the cloud. In the preview today, registration and metadata extraction are supported for the following data sources:
- Azure SQL Database
- SQL Server Tables and Views
- Analysis Services Multidimensional Dimensions, Measures, and KPIs
- Analysis Services Tabular Dimensions, Measures, and KPIs
- Oracle Database Tables and Views
- SQL Server Reporting Services Reports
- Azure Storage Blobs and Directories
- Azure Data Lake Store
- HDFS Files and Directories
- Apache Hive Tables
- Teradata Tables and Views
- Azure Data Lake Store Files and Directories
- MySQL Tables and Views
- SAP Hana Tables and Objects
More data sources will be added incrementally over time based on customer demand. Open APIs will allow customers to add their own custom data sources.
Editions and Pricing
Azure Data Catalog is offered as both a Free and a Standard Edition. The Standard Edition is free though July 2015. The Public Preview pricing starts on August 1, 2015. Pricing will be at a 50% discount until general availability. For more details on pricing and edition specific capabilities, please review the Azure Data Catalog pricing page.
To dig in and play with the improved preview of enterprise data catalog capabilities, go to the new web site, provision a catalog, add users and populate a few of your favorite reporting data sources. There is a short video and a fantastic step-by-step tutorial using Microsoft’s classic Adventure Works sample database already posted. If you are like me and prefer to read, here is a link to the online documentation.