Machine learning is a type of artificial intelligence (AI) that enables computers to learn without being explicitly programmed. Algorithms identify patterns found in data to generate predictive models.Typically machine learning tasks fall into three categories:
Supervised Learning – Computers train on labeled data and learn general rules. Commonly used algorithms include Support Vector Machines, Linear Regression, Logistic Regression, Naive Bayes, and Neural Networks.
Unsupervised Learning – Data fed into the computer is not labeled. The goal is to explore and find structure. Popular unsupervised learning algorithms include Cluster Analysis and Market Basket Analysis.
Reinforcement Learning – Computers learn through feedback systems. Reinforcement learning powers self-driven systems and robotics.
In this lesson, we will walk through creating a supervised learning model with IBM SPSS Modeler, a service within IBM Watson Studio. Let’s get started.
Getting Started with IBM SPSS Modeler
With IBM SPSS Modeler, you can build machine learning models with drag and drop ease. Using a visual canvas, you can load data, sample it, transform it, apply algorithms and evalute predictive model performance through a series of nodes to find hidden patterns or variables that influence outcomes.
For our first foray into machine learning, we will download and explore Titanic. Titanic is a publicly available dataset from Kaggle about the infamous shipwreck.Titanic sank after colliding with an iceberg killing 1502 out of 2224 passengers and crew. Unfortunately, the ship did not carry enough lifeboats for everyone. To predict what groups of people were more likely to survive than others, we will create a supervised learning model.
1. Creating a Project and Loading Data
After logging into Watson Studio, select New Modeler Flow. Enter a name, keep the default settings, and then click Create.
2. Loading Training Data
Next expand the Import menu, drag the Data Asset node onto the stream canvas and select Titanic training data file (train.csv) in the node settings to load data into the project. Right-click the node and select Preview to see your detailed dataset.
3. Designing a Stream
To build a modeler stream look under Record Operations. Pick Sample and drag it onto the canvas. Then click on the circle on the right side of the Data Asset node and drag the line to the left side of the Sample node to connect the operations. Now right-click on Sample to view the settings. For Titanic, we will use the First n defaults.
4. Choosing Model Algorithms
Now we will experiment with algorithms. Expand the Modeling menu, explore the vast library of available machine learning models. For classifying Titanic survivors, we will pick Decision List, Classification & Regression Tree (C&R Tree), and Neural Net. Drag those three nodes onto the canvas and connect them to the Data Types node. Now let’s run the stream.
To run the stream, click the small blue triangle on the stream canvas top menu. SPSS will process the data through the selected machine learning models. Notice upon run completion, new orange nodes appear. These nodes contain model performance results.
5. Evaluating Model Performance
To review the findings, right-click each of the model results nodes and investigate the evaluation menus. Note each algorithm has different options. For the Titanic C&R Tree Model, females with 1st class tickets had the highest 97.33% probability of survival. Other groups did not fare nearly as well.
6. Deploying and Using Models
Now that you created several simple supervised machine learning models with IBM SPSS Modeler, you would begin testing those models with the unlabeled Titanic test dataset (test.csv) to see if they continue to remain highly accurate for predicting survival outcomes on new datasets.
Keep in mind that finding an optimal machine learning model on your first run is unusual. Typically you will continue to iteratively experiment by refining machine learning model input and algorithm settings to improve predictive accuracy.
After a strong performing model is built, it can be used for predicting new data. To deploy a machine learning model, right-click a final output node and then click Save branch as a model. Navigate to your model list on your Watson Studio project overview page. On the right side of that list, click Add Deployment and choose Web Deployment, Batch Prediction, or Real-time Streaming Prediction. That’s all there is to it.
For More Information
In this tutorial, we introduced how to get started building machine learning models using IBM SPSS Modeler. If you’d like to learn more, please review the following recommended resources.
- Watson Studio: ibm.biz/watsonstudio
- Watson Studio online docs
- SPSS Flow online docs
This tutorial was written by Jen Underwood (@idigdata) courtesy of IBM Watson Studio. She received compensation to write it. However, all opinions expressed are her own.
The Author
You Might Also Like