As I was writing about Ralph Kimball’s, excellent, “must-watch” Cloudera webinars on big data warehousing 101, I received a mass email from Kimball Group and notification via Melissa Coates aka @SQLChick on Twitter that the entire Kimball Group is retiring in December 2015. I am totally shocked and a bit saddened by this news. Kimball Group might just be the #1, most respected group in all of traditional data warehousing. I am an avid fan and own a library of Kimball’s Toolkit books that have been invaluable throughout my career. I do hope that they will write at least one more book covering the massive big data, cloud and hybrid data world changes for data warehousing professionals.
As the Hadoop ecosystem evolves with improvements like the Apache Spark project and recent ACID support, there are a handful of bleeding-edge groups exploring and designing Hadoop big data warehouses that include non-relational key-value stores, graph stores, document stores, columnar stores, XML databases, metadata catalogs and other unstructured formats. If you are attending the Strata Conference on Big Data next week, I imagine there will likely be excellent conversations with regards to Hadoop data warehousing innovations, limitations, challenges and also current best practices of combining Hadoop with proven MPP relational data warehousing technology.
To get a current pulse on Hadoop data warehousing design impacts, I reached out to a few product team experts at Cloudera, Hortonworks and traditional relational database vendors. From what I hear, the landscape has not changed much yet – but – it is already happening. Today Hadoop is not quite ready or optimized for adequately servicing traditional data warehousing and analytics workloads for a myriad of reasons. Even Facebook, an early adopter of Hadoop, shared in a 2013 TDWI keynote that they were bringing in relational database technology to supplement their Hadoop implementation. From what I see and hear right now, modernized, hybrid data warehouse architectures are being designed and used to deliver the best of both worlds. Hadoop is being specified for mass storage, cold storage, large-scale predictive and exploratory data discovery. MPP relational databases are still being used for highly performant, reporting analytic needs where Hadoop struggles. Eventually the two may merge together from both sides.
If you are exploring the idea of adding Hadoop into a modernized data warehouse architecture, I highly recommend watching Kimball’s two free big data warehousing webinars. In these sessions, Kimball shares how his widely implemented, trusted approaches and techniques for data warehousing Extract-Transform-and-Load (ETL), dimensional modeling and business intelligence can be adapted to the new world Hadoop/NoSQL technologies – including details like Sqoop code snippets and data loading nuances for handling SCDs. He also covers why Hadoop and proven relational data warehousing technologies will morph to eventually become equal partners in hybrid data warehouse environments serving different needs.