As analytics maturity progresses concurrently with advances in modern business intelligence, we are seeing more innovative players in the areas of analytics automation. From automating visualizations, descriptive and predictive models to creating intelligent, textual natural language summaries of analytical findings, analytics automation has arrived along with the era of citizen data science. In this article, I will showcase a few analytics automation technologies and what aspects of these offerings should be embraced.
Smart Data Discovery
Automating analytics is not a new concept. I have written articles before on SAP Predictive Analytics (KXEN), Tableau, TIBCO Spotfire, Microsoft Power BI and other vendors providing wonderful suggested data visualizations, outlier detection, forecasting, clustering and intelligent predictive analytics capabilities in visual analytics tools. These applications do not replace an analyst. They aid an analyst.
If an analyst or business user feeds one of these tools poor quality data, the predictive results will be poor. Think garbage in, garbage out. I believe there still is an art to designing and providing these tools data elements that accurately reflect a business process. Even if automated analytics can work through millions of variable combinations that would be unreasonable for a human to do, a human might not understand the results or a machine might not be able to decipher nuances in business context. The true beauty of automating analytics is seen when combining the human mind with the power of intelligent machine learning. Best-in-class automated analytics provides machine learning results in human-friendly, natural language along with guided recommendations.
Analytics and data science communities have been discussing the strengths and weaknesses of analytics automation for a while now. The cries for data control, security, quality and statistical domain knowledge versus quick business decision empowerment reflect many of the same battle cries heard at the beginning of the self-service BI revolution. Don’t waste your energy trying to block analytics automation. Be a hero and help less data-savvy business users understand how to use these tools properly.
Historically advanced analytics and data mining practitioners did have tools that automated steps in the overall machine learning process. For example during the data understanding step of the Cross Industry Standard Process for Data Mining (CRISP-DM), a data scientist might run routines to rapidly identify attributes with the most information gain. The example below is from classic open source Weka Explorer for data mining using Microsoft Adventure Works sample Bike Buyers data.
On the data load and preprocess screen, data quality and skewness of manually selected fields can be reviewed. On the select attributes screen, information gain and ranking routines can be run. Weka provides a vast library of predictive algorithms and granular control of model settings. The steps are not hard but there is a steep learning curve in predictive modeling data prep and technique. The CRISP-DM process can be quite time consuming. As a result, predictive modeling projects are often time boxed.
After working through all of the CRISP-DM steps and finding a reliable predictive model, the algorithm output might get embedded into a business application, report or dashboard. If a predictive model was not found, the knowledge gained from simply understanding the most relevant attributes for increasing bike sales alone could provide fantastic business value to a marketer wanting to know where to invest limited budget dollars.
This same information gain concept can also be found in IBM Watson Analytics, BeyondCore and other modern BI tools designed specifically for a non-technical business user. In citizen data science tools, the approach and presentation are different. These offerings provide “easy button”, point, click and done analytics that automate the CRISP-DM process and create attractive presentation-ready output. Here are the findings from the same Bike Buyer data set ran within IBM Watson.
Here is the IBM Watson Analytics data quality report presentation that identifies outliers and skewed distributions.
To check individual attributes, the user would go to the Explore area of IBM Watson Analytics and manually drag and drop fields to the canvas.
Just like free Weka, free IBM Watson Analytics found cars, age and commute distance to provide the most information gain with regards to predicting bike purchases. The visual display of the same information is a bit more modernized and appealing. There was not an automated presentation or model export capability like I saw in BeyondCore for the analyst. IBM Watson Analytics seems to use a black box approach.
Taking automated information gain concepts much further, BeyondCore also provides interactive guided predictive analytics. Natural language descriptions of predictive model results aid business user understanding of machine learning findings.
Interactive forms allow the user to experiment and see how changing attribute options influences the outcome. This capability is exceptionally nice for rapid decision making in an uncertain world.
BeyondCore analysis can be exported in presentation-ready format to Word or PowerPoint. Automatically created predictive models can be output to SQL or R for embedding decision intelligence into other business applications.
Ideally citizen data science tools that automatically generate predictive models would also include detailed performance results such as ROC/lift curves, errors and error matrices in an advanced menu for the power business user or professional data analyst to review. Where predictive models perform acceptably and where they error is something that I like to understand before relying on them for business decisions. Having sold these solutions in the past to finance, operations and marketing professionals, I learned that I was not alone in wanting predictive model performance information. Business buyers do ask where they can get more detail on the statistical significance of results shown in cool reports. Business users also refer to estimated lift or return on investment when pitching projects internally.
Another important feature to evaluate that I have not seen in all citizen data science tools is the ability to output automated predictive model formats or query a web service. I prefer output formats that developers can use to operationalize intelligence in decision making processes. In my opinion, black box models leave a lot to be desired when it comes to providing actionable analytics for everyone.
Analytics Automation is Awesome
I found analytics automation offerings to be easy to use and truly valuable for many use cases. With the open source and freemium models, there are no excuses for not giving them a try with your own data to see what insights are identified. Learn to love the citizen data scientists shift. Dig into how these technologies actually work, what data to feed into them and how to get the most value out of them.