To keep up with the speed of business, data warehousing professionals are turning to modern automation solutions to optimize and expedite their processes. We are also beginning to see data warehouse automation combined with elastic cloud architectures to quickly deliver optimal performance at any scale.
Today a new data warehouse automation solution for Amazon Redshift by Attunity was announced at AWS re:Invent.
Historically, my peers and I invested significant time designing reporting data models and applying ETL frameworks using Kimball dimensional scripts, SQL templates, and customized command line utilities.
Automation options today are a warmly welcome innovation. It’s a relief to see that most of the tedious data warehousing work can now be streamlined across the entire life-cycle vastly improving data warehouse time-to-value and ROI.
To be completely transparent, I wish I would have had data warehouse automation a long time ago for proof-of-concept projects and data warehouse engagements to show my customers immediate, useful value. Since data discovery solutions entered the market, I began skipping data modeling. That was fine for securing funding but ultimately a data mart or data warehouse was still needed.
Data discovery and self-service BI tools do not replace data warehouses.
Analytics relies on data warehouse design patterns (ie. slowly changing dimensions) for accurately reporting over time periods. Those design patterns do not exist within data discovery tools.
Data Warehouse Automation
Modern data warehouse automation solutions reduce up-front analysis, design, and modeling. The approach for scheduling data warehouse projects changes. Previously we used a waterfall methodology and assigned teams to work for weeks or months collecting bus matrices of reporting requirements, experimenting with dimensional designs, and scrambling to populate enough data to see if we got our dimensional designs right.
Today with data warehouse automation in the mix, we can enjoy an agile development methodology where we quickly create or update a production-worthy data warehouse in days. Automation provides almost immediate gratification per se with the actual source data. ETL code is automated and scheduling of incremental loads is a simple point-and-click endeavor.
“Comparing prior EDW efforts is like comparing apples to oranges. We went from taking between 10-14 days to build a data mart down to just a single day.” – Data Architect, Leading national furniture manufacturer
Don’t believe it can be done? I completely understand that healthy skepticism. I had to reverse engineer Attunity’s data warehouse automation solution to see it for myself before I’d accept that it can be done… and done right. In doing that, I truly appreciated what I found.
Introducing Attunity Compose
Attunity Compose is an agile, model-driven data warehouse automation offering that has been developed and optimized over a 15-year period. It can automatically create, populate and even deploy changes to data warehouse designs. In addition to time savings, code quality and consistency, it also can be used continuously with DevOps application development processes.
A dimensional data model does not have to be prepared ahead of time.
For data warehouse professionals, the tedious, manual, repetitive aspects of data warehouse design, development, testing, deployment, operations, impact analysis, and change management are greatly reduced. In some cases, those tasks are replaced by fully automated processes.
Highlighted Attunity Compose data warehouse automation capabilities include:
- Model-driven data warehouse generation
- Real-time source data integration
- Automated ETL
- Data warehouse schema management
- Lineage and Impact Analysis
- Migration between environments
How it Works
It all starts with a data source. From there, Attunity Compose can derive a dimensional data warehouse using a model that can be auto-generated or imported from a data modeling tool like CA ERwin. Attunity Compose will generate a schema, create data integration (ETL), apply data quality and integrity rules, and add logging for monitoring and auditing.
For my evaluation, I began by defining a source or tables in a landing area. These can either be a schema or a database. Then I added a target data warehouse destination. After the source and target were defined, Attunity Compose auto-generated a logical and physical model based on my data source(s).
In reviewing the process to this point, I noticed slowly changing dimensions (SCD Type 1 and 2) were created for me to accurately report data over time. Apparently Attunity Compose supports a variety of slowly changing dimension type patterns including rarely found type 6.
I also reviewed data lineage features and noted that I could optionally extend the automatically-created model with derived attributes.
In addition to making model adjustments, I saw features for fine tuning data mappings between the tables in the landing area and the staging tables via an “ETL Set”. Digging deeper into data quality, Attunity Compose allowed me to define specific ETL rules with an expression builder prior or after data gets loaded. I could also define actions that should be taken when data is rejected/accepted by a rule (Reject and Report, Reject Silently, and Reject and Abort).
After data mappings and an ETL Set were ready, with a single click I extracted source data and loaded it into my automatically-created data warehouse. To confirm that it really worked, I peeked at the monitoring dashboard and the underlying SQL Server instance. The physical SQL Server tables were indeed there and loaded. Cool and way too easy!
The Attunity Compose monitoring dashboard provided me detailed statistics about the number of tasks run, current status, start and end times, row counts, and next scheduled task instance. A log viewer allowed me to see detailed command syntax that was executed.
That is all there was to automating data warehouse builds with Attunity Compose. It is really a nice solution. Notably setting up the initial data warehouse project involved minimal manual tasks for configuration. After that was set up one time, I could schedule automatic data load tasks to continue incrementally populating my data warehouse with new data.
In running through my hands-on exercise, I do understand how automation works and also why it is being embraced by my peers today. The solution time-to-value alone is a huge win. The baked-in best practices and slowly changing dimension support to accurately report over time periods is also a compelling value. I’d love to see this type of automated data warehouse capability built into end-to-end data discovery and business intelligence platforms. In the meantime, it makes sense to buy both to deliver production-quality analytics solutions faster than ever imagined.
For More Information or an Evaluation
If you are still hand coding or running scripts from the Kimball data warehouse toolkit, I urge you to check out data warehouse automation solutions. You won’t regret it. There really is a better way to build a data warehouse today.
To delve deeper into Attunity Compose, here are a few recommended resources.
- Attunity website or contact Attunity sales
- Resource library of white papers, videos, and more
- Free Trial of Attunity Compose for Redshift
- 2016 Gartner Magic Quadrant for Data Integration