Big Data seems to be everywhere and I notice quite a bit of confusion around it. Last week I saw Bill Inmon, the father of data warehousing, speak about his new book “Building the Unstructured Data Warehouse”. It was a good overview of how to combine traditional data warehouses with Big Data and unstructured data sources. It also covered why Big Data is not replacing the data warehouse – it is supplementing it. Big Data technology does a great job of loading, storing large volumes of data but is not really practical for reporting queries, joins and other workloads that data warehouses have been optimized to deliver.
One related aspect that Bill did not cover was event based processing with Big Data. Most of the groups I talk to that are asking about Big Data are also wondering about capturing events, identifying patterns and anomalies from event streams. Torsten Grabs, a Microsoft Principal Program Manager, wrote an easy to understand article on these technologies and also on how to combine them. The full article can be read here. Batch-oriented processing: Microsoft’s Hadoop-based services provide the MapReduce technology for scale-out across many machines in order to quickly process large volumes of data. Event-driven processing: Microsoft StreamInsight™ provides the capabilities to perform rich and expressive analysis over large amounts of continuously arriving data in a highly efficient incremental way. Complex event processing at scale: Microsoft StreamInsight™ can run in the reducer step of Hadoop MapReduce jobs on Windows Server clusters to detect patterns or calculate trends over time.