"Right Time" Data Integration & Trends
I recently went to an "information day" of a leading Data Integration company. Well, as expected, they talked about integrating huge volume of data and making most of the data that any organization captures. Also as expected they talked about Big Data and Cloud Computing and shared the company's vision of helping complex businesses with their new/upgraded offerings. While all looks fine and interesting from IT perspective, I kept on thinking what it means for businesses and how all these are going to help certain or in fact most businesses. With all new advancements and a huge array of offerings in the data integration space, I want to pen down my thoughts on how the Data Integration industry is shaping up and what it means for businesses to continue doing the right things in an optimal manner and not to be bogged down by perplexity of choice.
I somehow tend to analogize the evolution of technologies with the lives of human beings. We all are born small and immature and slowly over time, we grow bigger and mature. And then comes a time, with age, we tend to bend and start losing some of our common senses and sometime behave more like a child before we die and a new life is born somewhere else. This is the “circle of life”. I think any potential technology also starts small and slowly becomes famous and people starts talking about it and soon that becomes the buzzword around. And in most cases, after a while they become redundant and something else somewhere starts becoming the buzz of the industry. I call this the “circle of technology trends”. I strongly believe technology companies, in order to survive, should start realizing and support what organizations really need to run their business and move back to the basics.
As a data integration trend, which I believe is going to be there for a while now, is the concept of "right time" ETL. First of all we had traditional ETLs and then we had real time or near real-time integrations. It has been always a classic debate within IT and with business on which direction to go. And we built at times near real time batches in traditional ETL platforms and scheduled loads with complex mediation in real time platforms. But I guess time has come to realize that there's nothing called "real time" or "batches" for business. For business it should be always "right time" based on the nature of their businesses. And to support "right time", no single way of integrating will be enough in future. Real time, batches, cached data and various combinations of all these may be used to suffice the "right time" data needs of businesses.
Then the next big trend, which I feel is going to stay with us for a while now, is data virtualization. Data virtualization has been so far very popular in the DW/BI area enabling business/end users to have direct access to source data enabling quick prototyping and time to market. But today data virtualization is not only limited to DW/BI domains and the potentials are enormous. The ability to combine disparate sources in a virtual layer, perform on the fly transformations and publish as a data service towards any consuming application/user can also be leveraged as a potential "right time" data integration option. Also there lies huge potential in leveraging data virtualization techniques to log and report across disparate middleware platforms to provide an integrated view of your integration platform which is currently a challenge for most organizations.
I also see our industry is moving towards a more agile DI and BI direction. Which means traditional ways of doing DI and BI won't be enough in future. I don't see a challenge in storing or accessing data. It is a fact that 90% of total data in the world today is only created in the last 3 years. And the trend will only grow bigger from here. If I go by facts, through 2015, 85 percent of Fortune 500 organizations will be unable to exploit big data for competitive advantage. The main challenge and the success of any organization will depend on what they do with the data they capture. And for that, quick access to data and prototyping is a must which can be easily achieved through data virtualization.
As per me, Top 5 DI trends that’s here to stay-
- Cloud Integrations – The space will grow to achieve quick prototyping for DI platform. Also it will add immense value to small organizations enabling them to use best of breed data integration platforms without a huge initial investment on licensing.
- Bigger Data (and not anymore Big Data) – The data synching needs across various disparate systems will only grow from here. We are already and will even more in future, need to churn huge volume of data to keep systems/applications in sync. The digital universe is doubling every two years and is expected to grow to 40 trillion gigabytes by 2020.
- “Right Time” Integration Needs - The business will more and more ask for right time data synching needs. And it can be either scheduled batch transfers or real time transfers or a combination of both worlds. Integration platforms need to be elastic and scalable as voluminous data shatter the performance limits. So I believe event driven batches or processing of huge volume of data on trigger of certain events will define the future of DI space.
- Data Virtualization – The increasing size and complexity of database environments is straining IT resources at many organizations. As data continues to proliferate within organizations, many companies are looking for more efficient methods to enable storage and analysis. Virtualization is an approach that’s coming to the fore. Also there lies huge potential in leveraging data virtualization techniques to log and report across disparate middleware platforms to provide an integrated view of your DI landscape.
- Data integration is evolving – Data integration is moving beyond extract, transform and load (ETL). While the basic tasks of data integration – gathering data, transforming it and putting it into a target location – sound like ETL, new data integration trends and versions of data integration tools offer processes and technologies that extend beyond basic ETL tasks. These technologies help turn data into comprehensive, consistent, clean and current information. The tools support data migration, application consolidation, data profiling, data quality, master data management and operational processing. These tools allow businesses to determine the state of the source systems, perform cleansing, ensure consistency and manage all the processing, including error handling and performance monitoring. Currently most organizations have some or most of these features but in the form of disparate tools and in some cases even in-house custom made solutions. But I believe the DI vendors will soon start providing all the tools you need to manage a DI platform end to end as part of their base offering.