In a digitalized world where everything generates information and businesses are driven by data, handling uncountable sources of knowledge and extracting valuable insights from them resembles the painstaking work of a barman.
After all, typical enterprise data assets are anything but homogeneous and might be compared, allow us a fanciful metaphor, to “cocktails of data”. Their ingredients come from different bottles (i.e. data sources), require proper shaking and stirring techniques to prepare and combine (i.e. data transformation and data integration), and will be served in a suitable glass (i.e. data warehouses or any other central repository).
In this article, we'll focus on the shaking part and find out how these "data spirits" can be merged into a good drink for business intelligence analysts. We’ll also explain why cloud technologies, specifically cloud data integration platforms, may be the best cocktail shakers for most business scenarios and a potential aspect to prioritize when investing in business intelligence services.
What is cloud data integration?
Cloud data integration represents a set of tools and practices involving cloud technologies to connect multiple systems and enable ongoing data exchange between them for both operational and analytical purposes. It can be used in a variety of scenarios, which include:
- App-to-app integration to share and synchronize data (making it consistent among different data types) across Software-as-a-Service and on-premises applications.
- Platform integration to put in communication several platforms distributed across multiple environments, including on-premises, hybrid cloud, and multi-cloud.
- B2B integration to set up proper communication protocols and exchange data with business partners, even with different APIs (application programming interfaces).
- Microservices integration to configure APIs which act as a bridge between several small services running independently and handling specific processes.
- IoT integration to direct flows of data from networks of sensors and other devices to cloud platforms.
- Multi-cloud integration to connect multiple public cloud environments provided by different SaaS vendors.
- Big data integration to enable the extract-transform-load (ETL) pipeline, which transfers data from selected sources to data warehouses while preparing it for business intelligence and data analytics querying.
Cloud data integration benefits
Data integration per se is nothing new, but its cloud-based variant represents a step forward for a variety of reasons.
Synergy with cloud environments
The proliferation of rather complex tech ecosystems, which combine software applications and data storages distributed among cloud-based and on-premise environments (hybrid cloud), private and public clouds, or even different cloud services (multi-cloud), has driven several enterprises towards a more cloud-centric data integration approach.
In this regard, Gartner pointed out that 81% of public cloud users rely on more than one cloud provider.
Another catalyst for this shift towards the cloud as a key data integration enabler, as well as for other major data-related trends such as data warehouse modernization, is its immense scalability.
Considering the sheer amount of corporate systems involved in typical business processes, along with the countless external data sources enterprises should interface with to collect relevant information (such as social media, IoT sensors, or financial platforms), managing this increasing volume of data flowing among them may prove challenging. Not to mention seasonal variations in data processing and archiving workloads, which can be difficult to forecast timely.
Instead of investing in new on-premises hardware, more and more corporations turn to cloud services for flexible capabilities to keep up with continuous market changes and meet their operational and business intelligence requirements.
A comprehensive toolset
Most cloud service providers offer specific data integration tools and pre-built connectors to seamlessly design and perform new data integration flows. But how to actually tap into this toolbox while successfully implementing the cloud in your corporate scenario?
Well, over the last few years, in their pursuit of a data integration approach that could ensure shorter implementation times, cost optimization, and user-friendliness, several companies have turned to the so-called integration platform as a service (iPaaS) model.
What is iPaaS?
iPaaS involves the adoption of cloud-based platforms licensed by a third-party service provider on a subscription basis and centrally hosted, featuring a set of automated tools to integrate data and software applications distributed across multiple cloud and on-premises environments. Among them, we can usually find:
- Data ingestion tools to automatically gather data from disparate sources and direct this flow to a single data storage.
- ETL tools to design and manage the aforementioned extract-transform-load pipelines.
- Data cleansing tools to detect, replace, modify or remove corrupt data and duplicates.
- Data catalogs to label data assets with metadata, inventory them, and find them via proper search features.
- Data migration tools to transfer data from one storage system to another while ensuring format compatibility with the new location.
- Pre-built data connectors to move, filter, and transform data in a suitable format for querying and analysis.
- Data governance tools to set up procedures and protocols defining how data assets are managed and shared across an organization.
This rich selection of functionalities, designed to create a virtual hub connecting corporate apps and resources, unburdens organizations and their IT departments of all those data management and governance tasks typically involved in the data integration workflow by delegating them to the platform vendor.
Furthermore, data integration platforms generally come with solid, built-in security and monitoring features, and the services they offer can easily be scaled up and down on demand to suit your integration requirements without investing in additional on-premises resources.
How to select a cloud data integration platform
After defining the nature of cloud data integration platforms and their core features, let’s better frame their different categories and some selection criteria to help you choose a suitable solution for your business.
Nowadays, the range of data integration platforms available on the market is even wider than the gamut of tools and features they provide. We may split this vast offer of services into three sub-groups:
- Major cloud computing platforms which offer their own integration solutions to connect different applications deployed in their vast ecosystems, including Cloud Data Fusion, Azure Logic Apps, and Amazon EventBridge.
- Pre-existing data integration platforms developed by longstanding tech corporations and reimagined to fully embrace cloud technologies, such as Informatica iPaas, IBM DataStage, SAP Data Hub, and Oracle DIPC.
- Cloud-native data integration platforms created by smaller but dynamic companies and focusing on recent tech trends like real-time and augmented analytics, which include Boomi AtomSphere, Jitterbit Harmony, and Talend Data Integration.
Whether you opt for the comprehensive software ecosystems of the first group of providers, the indisputable stability and experience of the second, or the innovative approach of the third, consider the following parameters to choose the platform that meets your requirements the best:
- Full support for your corporate software applications and operating environments, be they SaaS or locally hosted.
- Top data processing performance, especially when handling large data volumes and multiple data integration executions, combined with monitoring tools to supervise and optimize platform resource utilization.
- Ease of use in terms of platform deployment and source-to-target mapping and integration workflow design, generally achieved through an intuitive GUI (graphical user interface).
- Ability to handle and interact with a variety of data types (structured and unstructured), data sources (CRM, ERP, and other corporate or external systems), data storages (OLAP, centralized, relational, NoSQL databases, etc.), and protocols (HTTP, FTP, etc.).
- Support of a full spectrum of data capture (real-time data ingestion, event-driven data acquisition, bulk import, etc.), transformation (data-type conversions, aggregation, etc.), and mapping (STTM, data lookup, etc.) methods.
- Wide set of pre-built data connectors and other integration tools, such as OData, HTTP, and FTP.
- Compliance with all major security standards and data protection regulations applicable to your industry, ensured via solid cybersecurity measures such as access management and data encryption.
Of course, along with such purely technical criteria, you'll need to examine the pricing and licensing conditions of each potential provider. For example, keep in mind that big-league vendors may tend to offer longer-term SLAs and more restrictive licensing options, albeit offset by a rock-solid range of services.
Another relevant metric to consider is providers' reputation, although its intangible nature makes it rather difficult to frame. However, peer review platforms and major consulting firms can help us demystify this variable and shed light on the pros and cons of the most important data integration platforms on the market.
Top cloud data integration platforms
When it comes to getting insightful feedback on cloud platform providers' reliability and offer of services, an authoritative source to count on is certainly Gartner. Here is a brief overview of the enterprise cloud data integration platforms awarded market leaders in its 2021 Magic Quadrant.
Pros: At present, Informatica represents the undisputed market leader in cloud data integration. Its offering combines AI and ML-powered automation, end-to-end integration in hybrid and multi-cloud environments, scalable data engineering tools, and a cloud integration hub supporting several data integration modalities.
Cons: Its DataOps capabilities aren't particularly brilliant, the new pricing model makes upfront usage forecasting difficult, and workload migration among some of its tools proved challenging.
Pros: Boomi's core strength is its wide offer of flexible programs and services that easily adapt to the specific business needs of corporations of different sizes and from different industries, including ad hoc integrators, data preparation functionalities, and vertical accelerators.
Cons: Recent acquisition by Francisco Partners and TPG Capital is raising concerns about Boomi's future roadmap and potential price hikes.
Pros: Organizations opting for Workato have shown great satisfaction with its ease of use, extensive set of features, and excellent customer support, resulting in relevant revenue and customer growth.
Cons: Workato suffers from a general lack of comprehensive EDI capabilities. Also, the documentation on the platform features and communication on updates and changes can still be improved.
Pros: Oracle can boast a diverse product portfolio and brilliant capabilities in terms of B2B and API-focused integration, automation, streaming analysis for operational intelligence, managed file transfer, RPA, and data virtualization, along with solid community support and a wide network of partners.
Cons: Specific support areas require further improvements, its licensing and pricing model is rather complex, and integrating third-party software may be a challenge without relying on Oracle's partner network.
Pros: SAP's cornerstone is flexibility, as its platform provides multiple data integration options across on-premises and cloud environments and solid API management capabilities, complemented with a rich offer of process-oriented, pre-built integration packages covering industry-specific scenarios.
Cons: Several customers reported setup complexities and delays, compatibility issues with non-SAP solutions, and relatively high licensing costs.
Pros: Companies relying on Microsoft’s Azure Integration Services can benefit from its global network of commercial partners, smooth integration with other Microsoft technologies, solid hybrid deployment capabilities, and support for a variety of integration use cases encompassing apps, data, and APIs.
Cons: Budget projection for Azure-based data integration projects isn't always straightforward and the Microsoft-centric approach embraced by Azure Integration Services may represent a stumbling block for potential customers.
Leverage our custom BI solutions to seize value from your data
Pros: MuleSoft's Anypoint Platform, which was acquired by Salesforce in 2018, is recognized as a broad and efficient toolbox encompassing iPaaS, B2B and event-driven integration, API management, ESB software, and microservices development.
Cons: It's mostly aimed at supporting integration specialists in complex integration use cases, but may not be the best option for nontechnical users, not to mention its relatively high price tag compared to other alternatives.
Pros: TIBCO's cloud data integration platform has been praised for its modular use and scalability, performance optimization, bulk and stream data management capabilities, and embedded marketplace offering a vast selection of reusable artifacts.
Cons: Following several acquisitions, TIBCO's portfolio has grown in size but also in complexity, making tool selection harder. Another concern comes from a shortage of experts mastering this platform, further worsened by the lack of self-service capabilities.
Unearthing data from the depths
In recent years, several consulting firms, such as Forrester and Gartner, highlighted the fact that most of the corporate data assets lie idle in the darkest corners of sprawling tech ecosystems, guarded by Gollum and other lonely creatures. One of the most plausible reasons behind this condition is poor data integration across multiple applications and repositories.
However, cloud technologies, along with AI-based automation and other innovative tools, may prove capable of acting as fast and capacious mine carts and bringing to light the gold of the 21st century, namely data.