Data warehouse modernization: a how-to guide

Data warehouse modernization: a how-to guide

July 25, 2022

Andrea Di Stefano

Technology Research Analyst

Every morning in Africa, as Thomas Friedman said, a gazelle wakes up and knows it must run faster than the fastest lion, or it will be killed. But since we're talking about data warehouse modernization, let's rephrase.

Every morning, in any enterprise relying on a solid business intelligence ecosystem, a legacy data warehouse doesn't need to wake up, as it's fully operational 24/7. Still, it knows it must ingest data, process it, and provide business insights faster than the fastest decision-makers' queries, or it will be dismantled and replaced.

In a standard business scenario, unlike the African savannah, nobody is supposed to die, even figuratively. But the market, at times, can be as ruthless as wild animals, and the growing competition forces us to optimize the tools at our disposal, including data warehouses. That's why data warehouse modernization may prove to be a wise option for companies investing in data warehousing services to stay one step ahead of their competitors. Let’s find out how.

What is data warehouse modernization?

Data warehouse modernization involves a comprehensive architectural rethinking of traditional, typically on-premises data warehouses to better deal with modern data management challenges and requirements, especially in terms of scalability, data silos, data and use case variety, processing workloads, and cost-efficiency.

This re-engineering procedure can be performed by combining several strategies, which generally encompass:

  • Growing reliance on cloud technologies by migrating a corporate data warehouse (CDW) to platforms such as Amazon Redshift, Azure Synapse Analytics, or Google BigQuery
  • Augmented data management and analytics through data warehouse automation tools that can perform data warehousing processes, including ETL pipelines, faster and more accurately
  • Implementation of operational business intelligence tools and techniques to acquire and process ongoing streams of data in real-time while offering self-service data querying features

Before we delve into these three options, it’s worth saying a few words about the business and technology trends driving data warehouse modernization.

Why modernize your data warehouse?

We'll not investigate the nature and importance of data warehouses here, as we've recently covered this topic in our article on building a data warehouse. What we can say, however, is that these information repositories are an essential business intelligence tool designed to perform a multitude of data-related tasks.

These include extracting data from corporate systems or external sources, transforming and consolidating such data to make it consistent among different data types and suitable for analytical querying, and finally loading it for future use (the so-called extract, transform, load or ETL pipeline).

Enhanced performance

As you might expect, handling such a complex and multifaceted sequence of procedures requires a fair amount of processing power and significant storage capacity. Nowadays, as data volumes in our highly-digitized world are growing at an overwhelming rate, several companies are opting for data warehouse modernization to enhance their system capacity and performance and therefore keep up with these massive workloads.

In this regard, Enterprise Strategy Group's 2019 How to drive speed, scale, and cost savings with data warehouse modernization report highlighted the fact that over one in four organizations considered data warehousing as their greatest data analytics-related challenge, especially in terms of system performance.

Top data warehousing challenges

Flexibility and scalability

While sheer performance and storage capacity matter, modern data warehouses should be designed with flexibility, scalability, and cost optimization in mind. After all, along with the general increase in volumes and types of data to manage (such as structured, semi-structured, and unstructured data), businesses may face seasonal workload peaks and fluctuations while having a limited budget to invest in boosting their on-premises data warehousing capabilities.

A potential answer to this challenge, as Bob Dylan would say, is blowing in the wind, specifically "among the clouds". But we'll talk about it in the next section.

Self-service analytics

Since we mentioned scalability, we shouldn't forget to point out that nowadays, data itself is not the only thing scaling up, as the amount of people requiring access to such data within any enterprise is sharply increasing as well.

This implies that a well-designed data warehouse should be equipped with solid self-service features to foster data sharing and democratization. In fact, BARC's 2019 Modernizing the data warehouse survey reported that limited support of self-service BI represented a major driver for modernization.

Upgrade your data warehouse with Itransition’s guidance

Let’s talk

Top data warehouse modernization strategies

We discussed some of the key data warehousing challenges. Now, let's move to the solutions provided by data warehouse modernization.

1. Cloud data warehousing

Having an on-premises data warehouse managed by your IT team and stored at your own location like a treasure hidden in a secret vault may still sound appealing to several executives. However, the general lack of scalability and flexibility characterizing such solutions makes them less and less attractive in an age where business needs and operational scenarios can change overnight and innovation cycles spin faster than the wheels of a Lambo.

That's why several corporations have chosen or are planning to migrate their data warehousing operations to the cloud, as shown by the Enterprise Management Associates' 2019 Data Warehousing in the Cloud survey, which reported how 87.4% of respondents were already adopting cloud implementation strategies.

Speaking of strategies, there are two primary models of cloud implementation:

  • Bring Your Own License (BYOL): this happens when organizations redeploy their previously purchased on-premises data warehouse platforms to a cloud-based infrastructure, allowing them to keep their favorite functionalities while leveraging the benefits of the cloud. However, not all vendors offer this option, so take it into account when selecting a suitable solution.
  • Data Warehouse as a Service (DWaaS): an alternative option involving the subscription to an off-the-shelf, cloud-native data warehousing platform offered by a service provider, which typically ensures ease of use and a shorter implementation timeframe. On the other hand, this may come with hidden costs to access any additional features and services (such as data backup, recovery and security), as well as with vendor lock-in issues due to a variety of adjacent services which aren't compatible with other cloud or on-premises environments.

The lock-in factor should be considered carefully, as one of the most popular options nowadays is to rely on more "creative" deployment modes known as multi cloud and hybrid cloud. The first one, namely multi cloud, involves the distribution of apps and cloud assets across multiple cloud services offered by different providers to optimize costs, benefit from a wider feature selection, or comply with local data management legislations.

Hybrid cloud, instead, implies a combination of on-premises and cloud resources or public and private cloud environments to scale up cloud computing resources on-demand and get the best of both worlds in terms of applications and data warehousing capabilities. Here’s an example of a hybrid ecosystem ensuring interoperability among applications, a data warehouse, and a data lake (a repository of raw data) both on-premises and in the cloud.

Hybrid cloud data warehouse architecture

Be it DWaaS or BYOL, standard, hybrid or multi cloud, whatever path you choose to take will probably lead your data warehouse towards a cloud environment providing several benefits:

  • Scalability and cost optimization: instead of purchasing new hardware when corporate servers are maxed out, you can easily scale up or down your cloud storage and processing capacity depending on your ever-evolving business requirements.
  • Fast deployment and managed infrastructure: cloud platforms offer out-of-the-box solutions requiring no particular setup efforts and benefitting from automatic updates and ongoing support.
  • Effective data integration: data warehousing cloud-based architectures are generally more efficient at consolidating data from a variety of different sources into a single point of truth, as they can easily connect with other cloud-based applications. For example, Google BigQuery relies on Cloud Fusion to ensure native data integration with over 150 data sources.
  • Security and governance: these key elements of data management usually fall, totally or partially, under the responsibility of cloud service providers, who also deliver a plethora of solutions designed to safeguard your data warehouse and related data assets, such as AWS CloudTrail or Azure Synapse's security features.

2. Data warehouse automation

In recent years, the digital world has undergone a transformation previously experienced by the manufacturing sector and involving an overall automation of its processes. In the realm of big data, this has led to the rise of augmented analytics and augmented business intelligence, which rely on AI, machine learning, and other technologies to streamline the analytical cycle.

Unsurprisingly, this trend has encompassed data warehouses as well, as they represent an essential component of data management and analysis. Indeed, BARC's aforementioned survey reported that automation represents the primary approach to data warehouse modernization. And so, while factories automated assembly lines with robots, data warehousing providers have opted to automate their ETL pipelines with AI-powered algorithms, which autonomously integrate data from multiple sources, transform and consolidate it, and finally load it.

Implementing such enhancements is relatively simple, since all major data warehousing platforms currently offer a comprehensive range of software solutions to help you modernize your system, including:

  • Informatica’s AI-driven Intelligent Data Platform automates the data integration lifecycle and therefore speeds up Azure SQL Data Warehouse project deployment.
  • Autonomous Data Warehouse is Oracle's cloud-based solution to modernize data management via automation and self-service features.
  • AWS Step Functions allows customers to create, test, and modify automated ETL workflows for Amazon Redshift, making them repeatable and easy to scale.
Automated ETL workflow through AWS solutions

These and many other solutions on the market represent a massive plus for managing huge data volumes and sprawling, ever-expanding networks of corporate data systems distributed among several locations. However, data warehouse automation tools can also foster data governance and compliance by automatically performing a variety of reporting tasks faster and more accurately than any human could.

3. Operational data warehouses

Over two millennia ago, Plato warned us that the combination of democracy and ignorance could lead to tyranny. In a corporate scenario, tyranny might not be the most likely outcome. However, sharing advanced software solutions and decision-making responsibilities with your staff without democratizing knowledge and providing timely, data-driven insights would still hamper your corporate performance at an operational level.

That's the assumption behind operational business intelligence, which aims at fixing the typical lack of readiness and elitism of old-school BI with real-time data monitoring capabilities and intuitive self-service data querying functionalities. The so-called operational data warehouses (ODW) embrace this approach by implementing a range of tools to analyze continuous streams of data and shed light on the current status of relevant business processes, even for non-technical staff. Alternatively, it’s possible to integrate a data warehouse with external tools specifically designed for operational BI.

This ensures faster BI queries, superior reactivity to ever-changing business conditions, and greater accessibility of corporate data assets. An example of these potential payoffs comes from two case studies reported by Enterprise Strategy Group's aforementioned research:

  • Sainsbury's, the second largest chain of supermarkets in the UK, adopted a data warehouse based on Snowflake's cloud platform, resulting in enhanced data democratization and data queries shortened from six hours to three seconds.
  • The American cybersecurity company White Ops implemented a Snowflake-based data warehousing solution which reduced fraud detection and review times from 24 hours to less than two hours.

Get your BI solution done by Itransition

Business intelligence services

Business intelligence services

We offer our holistic expertise in BI development and consulting to help you turn business data into your most valuable asset.

New capabilities to face modern challenges

We may consider legacy data warehouses as librarians frantically collecting tomes, having a look at the summary to get an idea of their topic, and carefully cataloging them in the most suitable shelves while providing visitors with apt suggestions regarding a book that might catch their interest. The problem here is that books are infinite, the library is pretty small, and thousands of readers decided to spend a cultural weekend (probably because Facebook and Instagram are down).

Indeed, the data tide unleashed by mass digitalization is paving the way to new, unexpected business opportunities. However, it's also putting a strain on our data management and business intelligence ecosystems, including data warehouses that will certainly require a radical overhaul made possible by cloud technologies, automation, and operational BI.