One of the top challenges for BI and analytics professionals in a cloudy, big data world is data movement and integration. The growing complexity of reporting across large volumes of heterogeneous data stored in different environments is daunting due to the principles of data gravity. Even with fast networks and improved caching, architects need to design solutions with distance, bandwidth, throughput, and latency performance considerations in mind.
Please join me in an Attunity sponsored 24 Hours of PASS webcast at 7:00PM EST this evening where we will showcase architectural design patterns and proven technologies that can solve the data gravity dilemma. Also enter the associated raffle for a chance to win an Amazon Echo.
In previous articles, I shared a few potential approaches to ease data gravity pains. Those options are viable with small data sets or data located near the app. This time I am going bigger and better! I will be exploring how to work with more robust approaches that are capable of handling larger enterprise reporting scenarios and massive data volumes with technologies like Kafka.
Data gravity pain
The inevitable need to move data closer to cloud BI solutions is why you see freemium or low cost Microsoft Power BI, Amazon Quicksight and Google Data Studio disrupting the market. Essentially the loss of on-premises and cloud BI app revenue is offset by selling much more profitable cloud data warehouses – Microsoft Azure Data Warehouse, Amazon Redshift and Google BigQuery.
“Thus if you are considering cloud BI, you also need to estimate the cost of a cloud data warehouse. Then you need to figure out how to get your data continuously moved into it.”
Creative data movement offerings like Amazon Snowball are great for one-time data transfers but they do not fulfill ongoing, diverse requirements. High performance data replication is often the right answer to defy the principles of data gravity while also providing flexibility for handling a wide variety of analytics scenarios and data sources.
Attunity provides world-class data integration and replication solutions that automate, move and transform data across many different data sources and cloud environments. Recently Attunity was named a Challenger in the 2016 Gartner Magic Quadrant for Data Integration Tools based on completeness of vision and ability to execute.
Attunity’s solutions are powerful, easy and quick to deploy. They work seamlessly across heterogeneous cloud environments and data sources to simplify complex data integration. For BI and analytics professionals, this solution would be ideal for data warehouse/ETL automation, change data capture (CDC), and replication in a hybrid BI architecture.
Attunity Replicate empowers an extensive range of data integration requirements for data distribution, migration, query offloading for reporting and real-time business analytics on premises or in the cloud. Next-generation change data capture (CDC) technology and intelligent in-memory transaction streaming significantly improve replicated data delivery times and data movement efficiency.
Attunity Replicate is simple to deploy yet offers secure, scalable, performant replication between mixed data sources regardless of location. It is often specified for the following analytics scenarios:
- Load data to operational data stores and data warehouses
- Create copies of production data to multiple data sources
- Distribute data across regions or realms to defy data gravity
Unlike native database replication tools, Attunity Replicate can execute replication tasks between many enterprise data source types. It uses a “Click-2-Replicate” design that streamlines integration.
With Attunity Replicate, you can move all data or just transfer incremental changes.
- Full Load Replication: Creates files or tables at the target endpoint, automatically defines the metadata that is required at the target, and populates the tables with data from the source
- Change Processing/CDC: Captures changes in the source data or metadata as they occur and applies them to the target endpoint as soon as possible in near-real time
How it works
Attunity Replicate is log based, meaning that it reads only data changes. The log approach reduces replication load on data sources and also does not require the same source and destination database flavor. You can easily mix and match data sources and destinations without writing any ETL code.
An Attunity Replicate data transaction log reader gets installed on the source data server or the Attunity Replicate server. Data filtering and compression of source data rows/logs occurs on the source or on Attunity Replicate servers to send changes to the configured data server destination.
In the initial load process, Attunity Replicate reads a filtered stream of rows and passes them to a transformation process for further filtering and subsequent writing to the target destination endpoint in the expected output format. The simplicity of point-and-click data replication to different output formats without scripting or syntax error nuances is a key benefit over other solutions.
The CDC process obtains a stream of filtered events or changes in data or metadata from the transaction log file. It then buffers all changes for a given transaction into a single unit before forwarding them to the target when the transaction commits.
Enterprise-class replication with CDC
CDC is not a new concept. Historically CDC has suffered from performance issues and scripted syntax errors when attempting to move data to different data source destinations. Attunity Replicate has mastered the heterogeneous CDC data movement process.
- CDC transactions are applied in real-time, in optimized batches and in order
- CDC plugs-and-plays with native data warehouse loaders
- IoT scale streaming of CDC can be used with Apache Kafka
- Cloud data transfer has been optimized for large scale operations
Optimized for Cloud Data Transfer
Attunity is capable of high-speed data transfers across multiple data centers and unrelated cloud platforms. Combined with Attunity Replicate, Attunity CloudBeam accelerates data movement between cloud storage environments and regions. It uses compression to reduce data sizes and parallelization of data streams to expedite data speeds over networks. Attunity CloudBeam supports Amazon Web Services (AWS), Microsoft Azure and Google Cloud.
Big Data Movement: Attunity + Kafka
Apache Kafka is a durable, fault-tolerant, publish-subscribe messaging system that can execute over 100k transactions per second. It powers gigantic scale apps like Twitter, LinkedIn, and Netflix. To handle growing IoT data volumes, more organizations are relying on Kafka for streaming data ingestion.
Attunity Replicate along with Apache Kafka can accomplish previously impossible, resource-intensive replication tasks.
“If you need to move extreme data volumes to one or more big data destinations or a Data Lake, the awesome Attunity + Kafka duo will make your job a lot easier to do.”
How to Get Started with Attunity Replicate
If you are interested in testing Attunity Replicate, you can download and install a free trial. The setup process only took me a few minutes from start to finish. Here are the steps.
- Download and install Attunity Replicate free trial.
- Launch the web-based Attunity Replicate Console.
- Define a Source and Target by entering in data source connection information.
- Add a Replication Task.
- Now assign the Source and Target databases to the Replication Task by simply dragging-and-dropping them into the visual diagram.
- Then select the specific Tables that you want to Replicate.
- Lastly run the Replication Task. You can watch data movement progress in the web console.
Hybrid data source reporting is a common requirement today. Despite the advances in remote data source connectivity and querying technologies, the physics of data gravity alone dictate a continued need for flexible, diverse data movement options. The process to securely transfer data efficiently between different cloud providers, data source types and sizes is quite easy to accomplish with award-winning Attunity Replicate. It is a must-have tool in your analytics arsenal for easing heterogeneous data integration pains and rapidly delivering data for real-time analytics.
For more information on Attunity solutions, please review the following resources.