One of the frequently asked questions that I get these days when talking to analysts about Tableau solutions is “what does Tableau offer around predictive analytics”? There does not seem to be a lot of information available on this topic. There is a fantastic How To tutorial on how to use R with Tableau in the Tableau community including some great pointers from Bora Beran on using SQL UDF functions with R. I have been continuously updating a SlideShare presentation as I learn more about this topic.
If I can find the time, I will write up a few white papers on various predictive solutions with Tableau alone and by combining Tableau with open source, R, SAS, SPSS, Alteryx, Microsoft SQL Server and Excel Data Mining, Oracle Data Mining, Oracle SQL Extensions, SAP HANA, Sybase RAP, and other options. In the meantime, I hope my blogs on this fun topic will be helpful. In my January blog, I covered how to combine Microsoft SQL Server Data Mining with Tableau (live view here). In this blog, I want to cover the super popular, R. Using R with Tableau is actually very EASY to do, does not require R scripting, or any scripting if you prefer simple point and click user interfaces.
Before we jump into R, keep in mind that there are some native Tableau, out-of-the-box, predictive features around Trending and Forecasting (new in v8) that do not require R or other any stats programs. There is also multi-pass aggregation capabilties that can be used for some predictive processing scenarios that may not be easily overlooked. The Tableau Trending feature covers both linear and nonlinear model types including Linear, Logarithmic, Exponential and Polynomial modeling with statistical significance details, model formulas, p-value, t-value, degrees of freedom, standard error, mean squared error, confidence bands, residual analysis and related model evaluation information. The Tableau Forecasting feature is new with Tableau v8. The Forecasting features use exponential smoothing algorithms with and without trending and seasonality detection. Both Trending and Forecasting can be accessed by right-clicking on a visualization in Tableau or via the Analysis menu. Anyone can easily use these Tableau predictive features – they do not need to be a data scientist. It would be helpful if they understood the concepts of statistical significance to understand the generated model formulas, what is a good model vs. what is not so good, or further deep dive into the model evaluation details.
If you are looking for more sophisticated data mining models, a combined R and Tableau solution can be used. You can use R with or without the nice point and click Rattle user interface to generate predictive models such as classification (kmeans, decision tree, regression tree), clustering (hierarchial, ewkm, bicluster), association (market basket), neural networks, support vector machines, or other types of R models, then export the R predictive model scoring output to a .csv file or a database, and visually analyze those predictive model outputs in Tableau just as you would any other data source. For most scenarios you can use the free versions of R and Rattle. For very large predictive models with 100’s of attributes and variables that surpass the free R version memory limits, something like Revolution Analytics for R may be a better option to create the predictive models and scoring.
Let’s walk through it. 1) Your data should already be prepared for data mining – a single data set, in a flattened format with variables and transformed variables like those used in statistics. The Dorian Pyle Data Preparation for Data Mining book is my favorite for learning how to best structure data for predictive analytics. Regardless of your predictive analytic tool choice, preparing data is something that has to be done and is a bit of an art and a science. If none of this makes sense to you thus far, you need to learn the basics of data mining to understand what you are doing even if it is easy to do. The classic book used to teach data mining 101 is the Witten Frank Hall Data Mining, Practical Machine Learning Tools and Techniques. The data prep task typically takes the most time in any predictive project, much like ETL in a data warehouse project.
Your flattened, prepared data can reside in a variety of places. R can load data from csv files, Excel, databases via ODBC, or other formats. 2) To load your data into R Rattle choose the data source and click the Execute button. Now choose the fields/attributes/variables that you want to predict (Target) and what variables are the influencers (Inputs) used in the prediction. You can optionally assign weightings for the inputs.
Now you can 3) start creating predictive models by choosing the desired predictive model type tab and model option.
Then you can 4) evaluate the model to check if it is a good or poor predictive model and continue experimenting until you have a good model.
When you are happy with your model Lift/ROC and want to running new data through the model and scoring it for predictions and visualizing it in Tableau, click the Evaluate tab, choose Type Score, Report Class or Probablity, and Include All to generate the .csv file model scoring output .csv file. Then click the Execute button and select where to save the output.
Yay, so now we have predictive model output that we can start exploring and visualizing with all the awesome features within Tableau that you already know, create a dashboard, publish it or even further explore it on a mobile device like an iPad, Android, or Surface tablet. 6) To load the predictive model output, open Tableau, choose Connect to Data, select Text file and “ta da”, now you can use Tableau with the predictive model just like you would with any other data source type. If you do want to incorporate some of the R visuals like the decision tree that are not in the box yet, you can embed that image into your Tableau workbook or dashboard like I did in the sample pictures at the start of this article.
Now there is no excuse for not adding some advanced predictive modeling into your Tableau analytics. It is FREE, quite easy to do, fun, and powerful!
I do confess that this blog post was rushed, written on a plane trip from Washington DC to Tampa, due to high demand after mentioning I did this for a recent POC. There will be much more to come on this topic when I have a few spare moments breathe! In addition to the white paper, I have a SlideShare presentation with a sample live demo of a Tableau packaged workbook from my previous post for you to play with this yourself. Enjoy!