Analytics is often a misunderstood art and science. Although there have been exponential technological improvements in adaptive algorithms and automated model development with high-end solutions, there often is not a simple button click or software wizard answer for non-trivial analytical questions. I do see oversold analytic automation capabilities across a variety of business intelligence solutions in the market today that can be a bit misleading. Do not assume that you can buy a tool, simply point it at Hadoop, a data warehouse or a raw data set, push a button and get immediate, valid predictive answers. That would be a false expectation to set with your stakeholders. Once a valid model is designed and developed, then it can be automated into that magical, easy button response.
In this article, I will share a compelling Southern States Cooperative analytics case study project designed by data artisan, Dan Putler of Alteryx, which resulted in an average marketing ROI of 186%. The exceptional results were accomplished by putting together multiple model pieces and automating the process as a set of incremental steps. This story illustrates a realistic approach to successful analytic projects today.
Analytical tools such as Alteryx, SAS, SPSS, SAP InfiniteInsight (KXEN) and many others do dramatically speed up model development times. However, even in highly automated analytic use cases there is still human confirmation, tweaking and monitoring. The incredible human mind has not been given a pink slip yet!
Southern States Cooperative was founded in 1923 and currently is one of the largest farmer-owned cooperatives in the United States with over 300,000 members and 1200+ retail locations in 23 states. Their prescriptive analytics project was a classic, direct marketing optimization situation. Southern States team’s goals were to better target, produce and maximize profits from direct marketing initiatives. This was not a straight forward, point-click-done single predictive modeling type task. The assignment would entail a multiple step process developing several individual models and then combining a few different models together.
In trying to develop a solution, the Southern States team was challenged by an inability to unify customer and marketing data across multiple sources. They also struggled with cumbersome data manipulation and inconsistent modeling in Excel. Ultimately they found working in Excel was taking far too much time and it lacked the depth of sophisticated analytical capabilities needed for optimization modeling. Alteryx was engaged to help solve the direct marketing optimization problem in a proof of concept project.
Modeling the Optimal Business Solution
In discussions, I learned that the initial Southern States analytic project was started, completed and completely automated by Alteryx in an impressive two day time period. The approach to developing a final model solution was to break the problem down into four distinct parts.
- A probability model of catalog use
- A revenue model to assess the incremental gross margin that positive customer catalog use provides
- An estimate of expected gross margin percentage for each customer
- An optimization module to assist direct marketing managers in selecting the mailing list and catalog items
One of the take-away points is that more than one algorithm result was used as input into another one to solve the final prescriptive analytic joint optimization problem. Multiple step, analytic journey solutions are quite common even for solving simple business problems.
When Putler began the Southern States project, it took him approximately 2 ½ hours to clean the data. Note data cleansing for predictive and prescriptive projects can vary dramatically depending on the initial state of data quality. There are a plethora of smart, sophisticated, automated data quality services tools in the market today that can vastly improve data for analytical usage and be integrated into tools like Alteryx. Historically data cleansing and data preparation tasks have been the most time consuming aspect of analytical processes. The Alteryx solution is excellent example of a tool that aids in expediting data extract, cleansing and preparation. What makes Alteryx a bit unique is the ease of use to visually drag, drop, design complex work-flows and automate those routines without any programming or IT skills. Alteryx also embeds rich libraries of third-party data sources such as MOSAIC cluster membership information, TomTom geospatial data, Dunn and Bradstreet firmographic information, and Experian ConsumerView (household file) and Census demographic estimates and projections data and predictive algorithm tasks via R integration.
Once the data was cleansed, transformed and ready in analytical model formats, Putler ran routines within the Alteryx solution to identify the most influential, significant variables. Separate RFM (recency, frequency, monetary) measures and a measure of past purchases of catalog items were found to be statistically significant predictors. The most important factor underlying customer use of a catalog was past purchases of catalog items but there was also notable diminishing returns. The key point is that not every variable collected in the sea of available data was relevant. Model predictive performance suffers due to over fitting issues when too many variables are used. Over fitting is a model design error that happens when you get a bit too specific and don’t account for what may be random data or noise. Many analytical tools have features that automatically identify the significant variables. The human mind then steps in to validate identified variables make the best logical and business sense to use for modeling purposes.
Using chosen predictor variables, Putler developed and evaluated multiple logistic regression, decision tree, and random forest models in an iterative process. Logistic regression ended up providing the best lift. Conversations around the initial logistic regression model results led to additional thoughts on how to improve the mailing list and also how to jointly optimize the specific items contained within catalog sent to a customer.
Moving from logistic regression and on to optimization, Putler selected and defined the following optimization model variables:
- Fit defined as the subset of possible merchandise items that “matched” the target market segment of the catalog
- Maximum number of items allowed in the catalog
- Potential set of target market customers who could receive the catalog
- Subset of target market customers that have an expected positive return
- Probability that a target market customer positively responds to the catalog
- Expected contribution dollars associated with a positive response
- Mailing cost
An optimization break even condition solution was applied for finding an optimal set of customers to mail catalogs containing a specific set of items. The selection of the included catalog items was a variant of the “knapsack” problem. Ultimately to improve the measure of past catalog purchases, Southern States added a metric to measure the value of an item SKU in a catalog.
After each mailing the joint optimization prescriptive model results are evaluated with A/B testing to see if the model really does live up to its implied ROI promise and continues to accomplish the Southern State project objectives. In the real word, conditions do change that have material model and variable impacts. Some changes over time involve a few tweaks of variables and continuous model training with current data. Others changes over time may require a new model build.
Business Value of Analytics is Undeniable
According to Greg Bucko, Senior Manager at Southern States Cooperative, “We have now fully adopted the predictive modeling for direct mail approach that we were experimenting with in the case study. An example of the impact of that change was dramatically highlighted by an October direct mail campaign. In October 2013 we conducted a direct mail campaign that was identical to the previous year’s campaign with one exception: the 2013 campaign was modeled. In 2012 we saw results that were about average for our mail campaigns pre-predictive: a redemption rate of 3% and a marketing ROI of -39%. Not a great campaign, and it cost us more than the incremental margins generated. Fast forward to 2013 with a predictive modeling approach and the results are dramatically different: a redemption rate of 10% and an ROI of +59%. In fact, over the last quarter we have done 8 direct mailer campaigns (all modeled) with an average marketing ROI of 186%.”
The time and resources needed to deliver high performing predictive and prescriptive models has been significantly reduced by modern technologies and tools. The business value of analytics is undeniable. However, there are times when a decent performing model simply can’t be found. As a result, it is difficult to estimate that a particular level of ROI can be achieved on these types of projects. Effectively communicating analytic process steps and sharing how modern tools like Alteryx can be properly used to expedite predictive and prescriptive model delivery can ensure realistic stakeholder expectations are set at the beginning of a project. It is prudent to displace common misconceptions about instant, easy-button predictive magic that seem to be swirling around the market.