I recently blogged about Self-Service ETL Tool Options. That blog was pretty much a high level summary of current market solutions in that space with links to further explore the reviewed tools and also find more related tools. Having now spent some hands-on time with the new Excel Data Explorer Preview add-in, I’d like to supplement that post with additional feedback. To be fair to the other excellent self-service ETL vendors, Alteryx and LavaStorm, I will explore their offerings deeper in future blogs too. I do know that Alteryx has released an upgrade that works with varied local R installations so I can better evaluate their predictive features in my next review. Since Excel touches the lives of at least 2 billion people globally and the Data Explorer add-in is currently free, I wanted to first dig deeper into the new Excel Data Explorer Preview add-in since most people will naturally start there and then decide if another tool is needed.
Just to refresh you, the Data Explorer Preview is an Excel add-in for Excel 2010 and 2013. You can download and install it from http://www.microsoft.com/en-us/download/details.aspx?id=36803. Data Explorer provides easy, self-service ETL type functionality within Excel. It has a simple, non-technical wizard interface for loading, combining, and transforming both structured and unstructured data into Excel – it also has some complex scripting capability via a new language called “M”. You can read up on M in the M technical docs. Additional resources to ramp up on Data Explorer include an introduction tutorial and a more advanced one on combining data data sources.
To get started in, navigate to the Excel Data Explorer menu, choose Online Search or a Data Source Type to Find, Query, Prepare, Cleanse, and Load Data into Excel. The current list of supported data source is fairly large: Web Page, Excel or CSV files, XML, Text, Folders and file metadata, SQL Server database, Windows Azure SQL database, Access, Oracle, IBM DB2, MySQL, PostgreSQL, Teradata, SharePoint Lists, OData, Hadoop HDFS, Windows Azure Marketplace, Active Directory, Windows Azure HDInsight, and Facebook. More data sources are sure to be added over time – I wish Twitter and LinkedIn for example were much easier. TweetSQL, DataSift, R packages, and many other third-parties can get you your social data to analyze. The Data Explorer Web Page and Online Search data sources were interesting and quite excellent. I can’t begin to count the number of times I have had search and then copy, paste and restructure data from a web page somewhere where the figures were embedded in messy HTML tables for analysis.
A few of the other features that I liked and foresee being really invaluable to typical data analysts are the pivot/unpivot, scripting, joining, filtering, deduplicating, grouping, splitting, and transforms. The Data Explorer formula language is pretty straight forward though I do wish learning yet another a new programming language was not necessary. Many of the Microsoft MVPs have great blogs on Data Explorer – Chris Webb, Paul Turley, Melissa Coates, Bill Pearson, Mark Tabladillo, Teo Lachev, Rob Kerr, and Jason Thomas’ are probably some of my top of mind, favorite bloggers that have good technical coverage. I know I am forgetting a few of the best of the best out there. One of the things that I found “hidden” or not best implemented was the nice menu for cleansing and data preparation shown above. The “trick” to see/access this menu is to right-click on top of the data column header name in the Query Editor window. That right-click is not at all intuitive given the little drop-down arrow the user is naturally drawn to click on. If the user does not know to right-click on top of the data column header, they miss out on the true beauty and power of all the transformation capabilities. I do hope that nit pick is fixed or I fear many analysts may overlook Data Explorer depth of functionality altogether or fear the script window being too complex. After visually choosing your transformations, you can then click on the script icon to see the generated M scripts. If you are a hard-core scripting fan, you can write your own M scripts in the script window.
Personally I don’t think Excel Data Explorer got the love it deserved at PASS BAC most likely since the UI does not “sizzle” and there were some grumbles about PASS 2012 Day 2 keynotes. Regardless, it is one of the best things to come out of Microsoft BI this year and it is something to add to your bag of analytical tricks. Since Excel Data Explorer is more for personal ETL versus enterprise ETL, you may be asking when should you use it. This is where I think Excel Data Explorer seems to make the most sense:
- You need basic ETL on data sources with less than one million rows. Refer to Data Explorer limits)
- You are struggling with a mess of VBA routines or are using Access linked servers with merge queries, and so forth (There are much better ways to do data mash-ups today though I keep seeing the VBA and Access messes all over the place in the real world.)
- You need personal data cleansing
- You want lightweight web page content imports or text mining capabilities
I feel that Excel Data Explorer currently is not a go-to solution for the following scenarios:
- You need large scale data for ETL or want true web or text mining
- You need ETL scheduling, monitoring, automation, etc.
- You want a data quality solution that references third-party data, sources, can be automated, uses workflows, data stewards, reporting, etc. (use a real DQS tool for this type of thing)
- You need needs to merge same attributes (i.e. Customer, Product) from different systems, system of record priority, workflows, data stewards, and automation (use a real MDM tool for this type of thing)
That concludes my “slightly” deeper dive into Excel Data Explorer. I know it is still pretty shallow from a “true techy” perspective but I do hope that it was enough to provide a better understanding for what Data Explorer is, what it can do, and where it fits.