Recently I had the honor of walking through Predixion Insight 3.1 and getting a glimpse at upcoming 3.2 directly with Jamie MacLennan, Co-Founder and CTO at Predixion Software. I have been a long-time fan of Jamie’s for almost 10 years now. He formerly was the development manager of SQL Server Analysis Services and also was the Development lead of the SQL Server Data Mining platform.
In previous blogs I have written about SQL Server Data Mining in What-If Analytic Simulation Options, Predictive Analytics with Tableau and Practical Predictive Analytics. I also have a myriad of Predictive decks on SlideShare that cover SQL Server Data Mining solutions such as Predictive Analytics with Excel, and Predictive Analytics with SQL Server. In most of these materials, I mention Predixion but I never go deep in sharing why I mention them so this blog post is long overdue.
(Note: If you are not familiar with the base SQL Server Analysis Services Data Mining features, please see my SlideShare decks on this topic: Predictive Analytics with Excel, and Predictive Analytics with SQL Server. This blog article assumes an existing understanding of that offering.)
Historically when I recommended Predixion, it was only when a customer wanted more than could be accomplished out-of-the-box with the base SQL Server Analysis Services Data Mining features. Microsoft also slowed down investments in that offering a few years ago leaving some of the features like PMML behind and out of date. After reviewing this latest version of Predixion Insight 3.1, I will be suggesting use of it much more often for Microsoft-centric accounts. Predixion Insight is far more feature rich than the base offering that I have been showing in my web casts, SQL Saturday sessions and blogs. Here is a peek at what the latest version of Predixion Insight 3.1 brings to the table.
Predixion Insight is a supplement to the base SQL Server Analysis Services Data Mining features. However starting with version 3.0, Predixion Insight also works with R and some Mahout via a plug-in that allows various machine learning libraries to be used. That is a really significant, warmly welcome enhancement that I will dig deeper into a little later on. Predixion Insight also has an Excel add-in, a Server and an optional Cloud offering. The user-friendly Excel add-in adds two new tabs to Excel, INSIGHT NOW and INSIGHT ANALYTICS. The INSIGHT NOW tab contains the enhancements to the base Table Analysis Tools. Things like Analyze Key Influencers, Detect Categories and Market Basket analysis reside here but have been improved upon. For example the Analyze Key Influencer reports are more detailed and the report presentation is nicer.
Most of the great, robust features data miners and analysts will use are located in the INSIGHT ANALYTICS tab. Here is where you will immediately notice the sheer breadth and depth of enhancements over the base Microsoft offering. Things like being able to use Power Pivot as a data source, a wonderful data profiling feature that I could see using on non-predictive projects as well as predictive, easier ability to use external data sources and an option for in-database model scoring, better sampling, discretization and labeling capabilities including a new predictive analytical expression (PAX) function that provides better binning, added features for normalizing the data or adding calculated fields with a statistical function library, a link to the Predixion Marketplace where you can share or get already developed predicted models to fast track your project, a link to the the Predixion Server or Cloud to centralize, share and collaborate on models, a feature rich Insight Workbench for developing predictive models, and so on. The list is simply too long to do justice in one blog. Refer to the online documentation to get a much better idea of all the goodies Predixion Insight 3.1 offers.
One of my favorite features that I longed for in the base Microsoft offering but never did have a nice work-around for is the instant base statistical information for predictive model variables. I simply love the way Jamie and his team have implemented this feature to show the variable distributions, core stats and correlation relationships with any other variables in the predictive model. It is fantastic and obvious to me that they know what information predictive modelers need and when they need it in the predictive life-cycle work flow!
Other enhancements that I really appreciated include the improved model performance and exploration capabilities that allowed me to test my demo predictive model using the classic Bike Buyer data mining sample data set with various variable combinations, a form of live predictive what-if analysis, to see how changes would impact model prediction and the related prediction score.
One of the biggest new features is a new Machine Learning Semantic Model (MLSM) allows data scientists and predictive modelers to change their work flow from creating predictive models to creating predictive applications. MLSM packages all of the necessary data transformations, predictive modeling logic, sampling, validation, and model selection generated while creating a predictive solution into a reusable application. That is a big deal for automating and speeding up predictive model development. In addition it also empowers predictive model sharing and collaboration amongst a team.
Revisiting the Predixion plug-ins for R and Mahout, I wanted to better understand how that works since I saw what looked like the classic Analysis Services data mining models except with GUID names that I was used to seeing in the past. I was informed to not be fooled – it is possible no model resides on Analysis Services so do not try viewing, exploring or building Predictive Queries (DMX) on the under the covers Predixion Analysis Services data sources. After training a predictive model and finding patterns regardless of library format (Analysis Services, R, etc.), Predixion extracts and stores the needed information in a Predixion specific format. In that process, adjustments may be made to improve model predictive accuracy. Bottom line, the raw models created in Analysis Services or R by Predixion cannot and should not be executed directly.
The only area that I thought was more difficult than the base Microsoft offering or could use some improvement for us mere mortals was prediction queries and embedding those into applications. In the base Microsoft offering I have been able to easily create and embed dynamic Prediction Queries (DMX) in applications and reports simply using an Analysis Services connection and DMX script (like a SQL script) with variables allowing real-time predictions. That is a powerful concept to make predictive models useful and actionable. I have used the Prediction Queries (DMX) solution for check fraud, healthcare payments and other use cases. The Predixion solution equivalent capability for real-time prediction scoring/queries was a bit more difficult. I pinged Jamie and his team to see if I was missing something. They came back with a few ways to achieve similar capability.
1) Predixion’s API can be invoked with a row of data (singleton), batch of data or a pointer to an external data source (e.g. a database table or a Hadoop hdfs:// store). From an application, the prediction query is invoked pretty similarly to Analysis Services DMX – an ADO.NET like connection, a query with parameters and results. The query is an XML representation of the request. From a modeling tool, such as Excel, the query is presented as a Visual Macro with a tabular format can be easily copied to SSIS ETL or other applications. You can execute a singleton query, in “real-time” mode from Excel by means of the VBA API, or from a .NET application. A VB API real time predictive call looks like below:
Dim pred As New PredixionVBA.Prediction
‘ Specify the target MLSM and Model
pred.Application = “Bike Buyer Demo Application”
pred.Model = “BikeBuyer_Classification”
‘ Add singleton inputs
pred.Inputs.AddField “Age”, 35
pred.Inputs.AddField “Gender”, “M”
‘ specify desired output
pred.Outputs.Add (“PredictProbability([Purchased Bike], 1)”)
‘ execute prediction and collect result
Set result = pred.Execute
All the scoring is done in the Predixion code, without calling into whatever machine learning library was used in training. If all the MLSM-transformation-code-plus-model-scoring belongs to Predixion, then it can be encapsulated and moved out of Predixion for embedding into applications.
2) Alternatively, the MLSM can be downloaded from the Predixion server and brought into a process running .NET or Java for true real-time scoring, without the price of the network latency – another enhancement over base Analysis Services DMX since that is simply not possible. Scoring code can also run inside certain databases, such as SQL Server (via SQLCLR), Greenplum (Java UDFs) or Hadoop (Java) and the result is in-database scoring, with no latency. If a developer wants to change the execution of a query from Server-side to Local, all that needs to be added is a few lines of code:
using (IDbCommand icmd = cn.CreateCommand() )
PredixionCommand cmd = icmd as PredixionCommand;
cmd.ScoringExecutionMode = PredixionCommand.ScoringQueryExecutionMode.LocalExecution;
// Download the MLSM and execute it locally
cmd.CachedExecutionPlanExpiration = new TimeSpan(0, 0, 10);
// the MLSM should be checked for updates every 10 seconds
cmd.CommandText = query;
Easy querying and embedding is an important capability as we will see more predictive functionality being embedded into business processes and reporting. Is it easier than before? Ah not really, I still don’t think so but it is not much more difficult and I do like the added packaging and in-database scoring options for removing network latency.
Some other features for embedding predictive analytics into data, reporting or business processes with Predixion include an API for VBA, components for SSIS ETL packages, updated PMML for exchanging models versions 2.0 through 4.0 of the PMML standard or ODBC connectivity that allows querying of Predixion job results. Last but not least, Predixion is compatible with the latest Microsoft technologies including Office 365, Excel 2013, Windows 8 and SQL Server 2012.
Stay tuned for additional articles on this topic here and on SQL Server Pro Magazine. I will also make a video and put it on my YouTube channel and hold an upcoming web cast on this excellent Excel Add-In and Predictive Platform offering soon. In the meantime, if you want to dive in and check it out yourself, download a trial and do the walk-throughs in the Predixion online documentation.