In our previous article, we introduced IBM Watson Studio and discussed helping our CMO and marketing team better utilize limited resources with advanced analytics. In this article, we will reveal our findings.
Our CMO has asked us to identify specific prospect segments that historically convert into paying customers to focus advertising spend on the right prospects. After understanding the best prospects to target, marketing tasked us to determine what types of social media are most effective to reach them with a minimal content marketing budget. They also wanted to understand what makes a social media post go viral.
Finding Top Prospect Segments to Target
To analyze prospect data, we begin by exporting contact data from our Salesforce CRM. We then combine contact data with third-party demographic data sources and prepare it for machine learning using IBM Data Refinery. Data Refinery is a visual data preparation solution that allows anyone to interactively discover, cleanse, and transform data. It includes over 100 built-in operations for simple transformations and preparation processes – no coding needed.
After loading our data, we quickly review profiles of each attribute and look for data quality issues.
To prepare our dataset for machine learning, we added descriptive bins of numeric variables and fix errors using built-in charts, statistics and menu options.
Since we are using personally identifiable data that is sensitive, we also publish our prepared dataset with automatic, dynamic masking of personally identifiable (PII) data attributes and apply governance policies limiting prospect dataset discovery and usage to the IBM Watson Knowledge Catalog.
Now we are going to build and evaluate machine learning models in Watson Studio. We start with a binary classifier since our target “Buyer” is a binary – yes/no variable.
Running through the training and validation results using a Train: 60%, Test: 20%, Holdout: 20% split, the results are not compelling. We noted Area Under ROC of 0.688. Essentially, binary classification only accurately predicted about 18% more than random guesses.
Next, we try a Logistic Regression model and let Watson automatically prepare the dataset for us. This time we get a slightly better Area Under ROC result of 0.706. It is still not a strong model. Thus, we keep on experimenting.
This time we spin up IBM® SPSS® Modeler Flow Editor to develop predictive models. Here we have more control over the specific algorithms used, the input parameters and output. We try a Decision List algorithm from the library of available machine learning options since the output of Decision Lists are easy for anyone to understand.
Looking through the results, we identify several segments to target that have 77% purchase probability. This is better than our two previous models. Let’s try one more model type, and get these insights ready to share with marketing.
For our last machine learning model, we select a C.5 Tree Model. Reviewing the results, we learn this model seems to perform the best. The C.5 Tree Model also is straight-forward for explaining the results to our marketing stakeholders. The first key finding is obvious but it also confirms the model works – Cars and Age – are the most important attributes to assess for finding the most likely prospects to buy a bike. We also see Commute Distance also is relevant.
Diving deeper into our predictive tree model, we can find more specific business rules. Here we identify the ideal segments for marketing to target and the segments to ignore. This will help them use their limited advertising budget wisely.
Peeking through the C.5 Tree Model Decision Rules listing, we see the category, record counts, percentage and rule confidence level. Some of these business rules exceed 90% likelihood to purchase. These are fabulous findings. We then highlight the high conversion segments and add those to a presentation for our CMO.
Content Marketing: What Drives Viral Posts
Now that we know who to specifically target, the next question marketing needs answered is how to reach these segments cost effectively with social media. Last year our marketing team noticed that social media engagement and post shares declined. They were not alone. According to Buzz Sumo, a tool used to analyze what content performs best for any topic or competitor, in 2015 approximately 50% of randomly selected posts received 8 shares or less. That number dropped to 4 shares or less in 2017.
50% of social media posts receive less than 4 shares
To be seen by prospects, marketing needs an amplification plan that includes advertising spend. Knowing what social media content gets shared escalated from a “nice to know” to a “need to know”. To learn more about what drives a viral post on social media platforms, our team downloaded and analyzed a public dataset on News Popularity.
For this analysis, we began with an effortless Watson Studio Automatic Model. After defining a name we picked Automatic, clicked next and assigned our News Popularity dataset. Since News Popularity’s target value forecasts numeric article shares, we opted to use a Regression model. The input attributes in this dataset include number of pictures, videos, day of the week, article length, industry, sentiment and so forth.
Ultimately, we learned social media posts with many links, more than 12 images, a video or more than three keywords consistently performed better than other posts. Posts that contain a high number of images makes sense might go viral. Think about a natural disaster, catastrophe or storm article. Those posts usually do have a high number of photos and indeed are shared and viewed by many more people than an ordinary news story.
Posts with many images, keywords or a video were consistently shared more than other social media posts.
Other key findings were that grumbly articles performed better than happy ones! Posts with avg_negative_polarity where shared more often than posts with avg_positive_polarity. Last but not least, weekend posts were less likely to get shared. If you think about all the people that surf the social web at work, that insight also seems intuitive.
Now that marketing is armed with the right prospect segments to target and knows what specifically makes a social media post more likely to be shared, we will annotate our baseline results and share our findings using Watson Studio collaboration capabilities. You can add collaborators at the project level, giving team members across the enterprise governed access to the project data sources, analytical notebooks, predictive models and other assets.
If you’d like to learn more about Watson Studio, please review the following recommended resources.
- Watson Studio: biz/watsonstudio
- Watson Studio online docs: https://dataplatform.ibm.com/docs/content/getting-started/overview-ws.html
- Watson Studio Deep Learning webinar: https://event.on24.com/wcc/r/1666635/7F38AE27FE3E8E2E02C8AFEB2F95B00B
This post was brought to you by IBM Watson Studio. I received compensation to write this post but all opinions expressed are my own.