Shhhh…everyone is asleep. Finally I can sneak into my home office, light up the lovely Sun & Sand Yankee Candle, and spin up the Azure Microsoft Data Science Virtual Machine with Revolution R, Python, Visual Studio, SQL Server and Power BI Desktop. Last week Microsoft made an array of announcements including news that Revolution Analytics R Enterprise (RRE) is now Microsoft R Server for Hadoop, Linux and Teradata. New R Tools for Visual Studio were also unveiled that are much like the Python Tools for Visual Studio. Having been a longtime Revolution R and RStudio fan, I wanted to dig on in as soon as I heard the word. Here is what I feel is exciting about “revolutionizing” Microsoft analytics offerings.
Baking R into Microsoft Solutions
It has been almost one year since the acquisition of Revolution Analytics, the leading commercial provider of software and services for R was revealed. R is the world’s most widely used programming language for statistical computing and predictive analytics. To say I was ecstatic about that news would be an understatement. Since that time, I have already seen R capabilities cooked into SQL Server 2016 CTP3 as SQL Server R Services for advanced analytics within one of the world’s most popular relational databases. In my blog called R you ready for SQL Server 2016, I shared that these new capabilities allow analysts to make parameterized calls to the scalable R runtime from SQL code or stored procedures to get R computed result sets or R data visualizations. This enables you to operationalize advanced analytics into apps, Reporting Services reports or dashboard projects.
Last November, Power BI Desktop added R script as a data source and R data visualization types opening up whole new world of data discovery possibilities. Now Power BI reports can intelligently render forecasts, scored predictive models and a plethora of other truly powerful analytics algorithm results.
R visualizations provide deep granular control of many additional chart properties for ultimate design flexibility. Another little secret…R visualizations in Power BI Desktop can display more data points than the out-of-the-box visualizations for big data analytics scenarios. Last but not least, there is a vast sea of R data visualization types powered by ggplot2 and other R community libraries now available for Power BI users to create awesome analytics. Eventually I will make time to blog about several fantastic Power BI Desktop + R use cases. In the meantime, Jan Mulkens has a good getting started blog series.
So What is New Now
The buzz last week was that Microsoft actually launched a new enterprise server offering. That rarely happens. Formerly sold as Revolution Analytics R Enterprise (RRE), Microsoft is going to market across operating system platforms not just Windows, on-premises and in the cloud, with the newly branded Microsoft R Server for Hadoop, Linux and Teradata. This move means that you can standardize on advanced analytics with R if you are using Hadoop (Hortonworks, Cloudera and MapR), Linux (Red Hat and SUSE) or Teradata. Integration into Cortana Analytics Suite solutions such as Azure HDInsight and Azure Machine Learning was also mentioned.
For Windows shops, Microsoft R Server is included in SQL Server 2016 as SQL Server R Services.
Microsoft R Server supports a variety of big data statistics, predictive modeling, and machine-learning capabilities from exploration to analysis, visualization and modeling. It is compatible with open source R scripts, functions, and the comprehensive R archive network (CRAN) packages. Microsoft R Server can handle massive data sizes and multi-threaded processor optimized computations on hundreds of server nodes. From a technical architecture perspective, ScaleR, DistributedR and ConnectR are the key to powering up big data analytics with parallel and chunked processing in R enabling analytics on data sizes that far exceed available memory sizes.
- ScaleR: Provides algorithms optimized for parallel execution on big data. These algorithms are optimized for transparent distributed execution, eliminate memory limitations and scale from laptops to servers to large clustered systems.
- DistributedR: Is an adaptable parallel execution framework the includes services for communications, storage integration and memory management that enable ScaleR.
- ConnectR: Brings access to any data source ranging from simple workstation file systems to complex distributed file systems and MPP databases.
During these latest announcements, Microsoft also shared that a new Microsoft R Server Developer Edition with all the features of the commercial version will be made available as a free download and baked into a future version of the Azure Data Science VM. Microsoft R Server is available to students for free academic use via DreamSpark and Revolution R Open, now called Microsoft R Open, is another free download for anyone.