So much for a slow summer. In August 2017, Python exceeded R in popularity in a recent KD Nuggets poll. Both the Gartner Magic Quadrant for Data Integration 2017 and The Forrester Wave™: Insight Platforms-As-A-Service, Q3 2017 reports were released. IDC shared updated Worldwide Big Data and Analytics Software Market Share stats. Tableau, Salesforce, Drastin and several other vendors made natural language analytics related announcements. Amazon AWS unveiled Glue, Macie and other analytics updates at AWS Summit NYC. Lastly, HiQ won round one of the data scraping case. Here’s the latest industry pulse.
Python New #1 for Data Science
The Python ecosystem overtook R in 2017 as the leading platform for Analytics, Data Science, and Machine Learning per KD Nuggets research. R usage is slowly declining (from about 50% in 2015 to 36% in 2017), while Python share is steadily growing – from 23% in 2014 to 47% in 2017. The share of other platforms is also steadily declining.
For the full story, read the detailed article on KD Nuggets. If you are interested in getting started with Python, check out my recent articles on that topic.
- Getting Started with Python [Part 1]
- Getting Started with Python [Part 2]
- Getting Started with Python [Part 3]
Gartner Magic Quadrant for Data Integration 2017
Gartner, Inc. released the annual Magic Quadrant for Data Integration Tools report last week. It is a “must read” for enterprise architects, data warehousing, big data and analytics professionals. Data integration is a crucial capability for the modern digital business. Selecting the right data integration platform has become more important than ever. For a complimentary copy of the full report that covers 15 leading data integration vendors, click here.
The Forrester Wave™: Insight Platforms-As-A-Service, Q3 2017
Another fascinating report is the new The Forrester Wave™: Insight Platforms-As-A-Service, Q3 2017. This year Google leads the pack with Machine Learning, Real-Time, and Insight Application Tools. Here is a link to a complimentary copy of the full report. It is a good read for analytics pros.
In the Insight Platform-As-A-Service Wave, Forrester evaluated the most significant providers in the market. Within the 36-criteria evaluation of insight platform-as-a-service (PaaS) providers, analysts narrowed the field down to the eight most significant ones. The report shows how each provider measures up and helps enterprise architecture (EA) professionals make the right choice.
Historically, Google BigQuery has been a best-in-class cloud data warehouse offering winning extraordinarily huge deals in enterprise and government sector projects. Google data visualization is another story. Free Google Data Studio is lagging. I’ll cover the free Google Data Studio soon and try to find time in my schedule to learn more about their other offerings.
IDC Market Share Stats
IDC shared annual Worldwide Big Data and Analytics Software Market Share stats from 2016. SAS has provided a free copy of the full report. I like this report for getting a glimpse into macro-market adoption and migration trends.
Per the IDC report, “In 2016, the overall worldwide market grew 8.5%. The top 3 fastest-growing areas were nonrelational analytic data stores (58.0%); cognitive/AI software platforms, content analytics, and search systems (15.7%); and customer relationship analytic applications (12.0%). The two largest market areas were end-user query, reporting, and analysis, and relational data warehouse management, with growth rates of 6.6% and 5.2%, respectively.
In 2016, Oracle continued as the largest big data and analytics software vendor with 14.3% share, followed by SAP, Microsoft, IBM, and SAS. Together, these top 5 vendors had 50% market share, down from 55% in 2014. Among the top 35 vendors (all with over $100 million in 2016 BDA software revenue), the fastest growth came from Anaplan, Hortonworks, and Amazon Web Services (AWS).“
More Natural Language Analytics Buzz
Last month I shared my latest white paper called “How Advanced NLG is Evolving Application Design” sponsored by Narrative Science with you. I truly believe natural language will positively improve human computer experiences in the analytics landscape and beyond. In less than two years, Advanced NLG is expected to become a standard capability in analytics applications. I anticipate similar swift adoption rates in other industries.
This month we saw Tableau acquire ClearGraph Natural Language Query. A super savvy reader of mine let me know that Drastin, a Gartner 2017 Cool Vendor, had been bought. The founder shared the news on LinkedIn but did not reveal who acquired them. Thoughtspot announced additional funding. Salesforce introduced Seq2SQL, natural language for relational databases queries. Here is an animation of it.
Amazon AWS Summit NYC
This month Amazon AWS finally revealed Glue for extract, transform, and load (ETL) that eases data pipeline development. They also announced Macie, a security service that helps prevent data loss by using machine learning to automatically discover, classify, and protect sensitive data in AWS. Several other analytics updates from AWS Summit NYC can be reviewed on the official site.
On a quick glance, Glue does not look like other ETL tools. It looks like a PySpark code generator for data orchestration with a related data datalog. From the AWS Management Console, you can point AWS Glue to data stored on AWS. It discovers your data and stores associated table definition and schema metadata in an AWS Glue Data Catalog. Once cataloged, data is searchable, queryable, and available for ETL. AWS Glue PySpark code that you can customize, reuse, and transfer is generated to execute data transformation and data loading processes. AWS Glue “serverless” service provides a fully managed, scale-out Apache Spark environment with scheduling, dependency resolution, job monitoring, and alerting.
From what I tested so far, Amazon AWS appears to be focused on empowering ISVs, app developers and businesses with infrastructure services, app services, and code frameworks. I have not seen much from a business apps perspective. There have been minimal updates to Amazon QuickSight this year since I created an O’Reilly class on it. I’ll be repeating that class on September 12th if anyone wants to learn the basics for building dashboards with it.
Data visualization is the key to selling cloud deals.
I can’t figure out why Google and AWS don’t invest in data visualization. Data visualization is the key to selling cloud deals. Show a prospect a database or ETL pipeline diagram – they don’t care, no deal. Show a prospect a couple of interactive charts with a sample of their own data – they do care, you win the deal. Most of the time, it really is that simple! For Google and AWS to not invest in the key apps for selling cloud makes me wonder.
HiQ Wins Round One in Public Data Scraping Case
Public web site data scraping is widely used today for a variety of analytics use cases, data curation, market intelligence, and the growing practice of data monetization. The public web site data scraping case that I mentioned in last month’s update had a ruling. HiQ won round one – but this case is far from over. Laurence Henry Tribe, a professor of constitutional law at Harvard Law School, presented for HiQ. The documents are now posted at https://www.hiqlabs.com/legal/ for review along with a litany of media coverage discussing the potential implications for public web data ownership. A variety of laws related to data privacy, free speech and fair use of data could be impacted by the final outcome.
Until Next Month
That wraps up my selected analytics industry highlights from this month. Next month I will cover Strata New York, big data adoption trends and a plethora of other updates as we head into the busy fall conference season.
In the meantime, I’m signing off to get an adorably tiny, furry family member named Angel outside along with my recent rescue Summer. Once again, we have a joyful, little pack of pups in the house!