June 17, 2022
Table of contents
Everything hinges on data these days. What products to sell, how to produce, market and sell them, whom to sell to, and how to make customer experience unique – companies across industries are trying hard to embed data-driven insights into every business decision, customer interaction, and process. In the meantime, analytics and data warehousing are gaining traction as enablers of decision-making across organizations today.
Let’s investigate the concept of an enterprise data warehouse and study its architecture and types. We will also define which DWH capabilities make companies seek BI services to deploy one and outline factors guiding the data warehouse tech stack selection.
An enterprise data warehouse (EDW) is a solution that collects business information from multiple heterogeneous sources and then consolidates and stores it in a centralized repository ready for analytics and reporting.
The fundamental purpose of an enterprise data warehouse is to break down data silos, restructure corporate information and make it easily accessible for companies to make timely evidence-driven decisions and operate efficiently in a rapidly changing environment.
Data marts (sometimes referred to as ‘traditional’ or ‘usual’ data warehouses) are actually subsets of an enterprise data warehouse. They have the same functionality as enterprise data warehouses – collecting data from different sources and making it readily available for analysis. However, they have a narrower scope as they are designed to meet the needs of particular departments or lines of business. This means that they collect not all the data available, but the one that its users (sales, marketing, HR, etc.) need. Data marts are usually created for the sake of speed, since their requests are more specific and their queries run against a smaller amount of information.
Operational data stores contain up-to-date information in its original format consolidated from disparate operational sources to enable real-time reporting. Although it may seem rather similar to what a typical data warehouse does, they have rather significant distinctions in terms of:
Enterprise data warehouses store historical as well as current data, while ODS storage capabilities are limited to maintaining the most recent records, making it a go-to tactical tool.
Enterprise data warehouses host information after it undergoes particular transformations (cleansing, enrichment, reformatting, etc.), while ODS keep information in its raw state in order not to sacrifice the speed of analysis and reporting.
Enterprise data warehouses are built for running complex analytics queries on huge volumes of information, while ODS are used for relatively simpler queries run on real-time data – for example, to quickly identify the reason behind a failed transaction.
As you see, data marts and ODS are not substitutes for a data warehouse, but rather complement the whole enterprise data warehousing environment.
You want to assess your business performance at an enterprise scale, run risk analytics, conduct strategic planning, or perform any other activity, which requires the analysis of cross-functional historical information. However, you cannot do that because the information you have is either scattered across the company’s systems (ERPs, CRMs, etc.) or randomly piled up in a data lake and ungoverned.
Accelerating the speed of analysis is a competitive advantage you need to embrace, but running analytics queries against your OLTP databases is extremely slow and results in system failures, as it involves reading millions of rows of data.
You don’t have a unified approach to data governance and management, which leads to data inconsistencies, data redundancy, varying data quality, data access constraints as a result of security policies, duplicated efforts, etc. All this tangles up analytics workflows and in some cases even compromises the analytics results.
All data management activities (the extraction, cleansing, transformation, etc.) are manual and therefore resource-consuming and error-prone.
Your business users are rather dependent on IT teams for day-to-day reporting. This leads to tech specialists becoming overburdened with routine analytics activities on the one hand, while business users can’t obtain the insights they need.
You need a solid data environment to introduce new analytics initiatives – big data analytics, predictive analytics, self-service BI, ML and AI, etc.
It may seem that an enterprise data warehouse is merely a repository that is connected to your data sources (CRM, IoT devices, SaaS apps, etc.) from one side and to the BI and analytics software from the other. In truth, an enterprise data warehousing solution is the whole data processing and storage environment, which consists of the following components:
Modern enterprise data warehouse solutions are usually built according to the three-tier architecture, which includes:
However, the architectural approach should be dictated by the specific needs of a company, so in some cases other design methods (e.g. a one-tier or a two-tier architecture) may prove to be more suitable.
When setting up an enterprise data warehouse, businesses have to choose whether to do that in the cloud, on-premises, or via a hybrid environment.
This option implies deploying and running an enterprise data warehouse on the server located onsite with the help of an in-house IT team. It is a sound choice for companies, which need comprehensive control over the data warehouse hardware and software infrastructure to:
However, be cautious, as the option requires heavy upfront investments (hardware acquisition, software licenses, IT resources, etc.), comprehensive capacity planning (since on-premises EDWs are not easy and quick to scale), and an experienced IT team to keep the system running efficiently.
When an enterprise data warehouse is hosted and managed in the public cloud, then all hardware-related costs, software licenses, infrastructure audits and maintenance are on the vendor.
This option is beneficial in many ways, and this is why cloud data warehouse adoption rates are constantly increasing. Among the significant benefits are:
Cloud data warehouses may seem an optimal option but for two things: data compliance and hidden costs. Regardless of the fact that major cloud vendors meet multiple compliance requirements, there are still cases when you must not store data in the public cloud (for example, national infrastructure data, industrial secrets, etc.). As for costs, even though cloud vendors do offer calculators and free plans to estimate possible expenditures, pricing still isn’t as straightforward as one imagines. Through the efforts of skillful marketers, such factors as extra fees for moving your data out of the cloud (egress fees), hot data storage, excess compute, geo-redundancy, etc. may pass undetected.
Getting the best of both worlds (on-premises and cloud data warehouses) is a go-to variant for companies that:
As the field of data analytics is being heavily invested in, the diversity of data warehouse software technologies may be overwhelming. A properly selected tech stack can decrease your enterprise data warehouse implementation and maintenance costs, while an unsuitable solution can result in negative ROI. So when assessing enterprise data warehouse technologies, you need to consider multiple factors:
When choosing an appropriate platform, besides targeting your current data volumes, you have to also take into account your annual information growth rates as well as strategic data management and analytics objectives, including plans to support your decision-making with new data sources. It will help you avoid overwhelming storage costs and EDW scaling inflexibility.
Tip: If your current and target data volumes are measured in terabytes, petabytes, etc., you may want to consider cloud data warehouses with separate storage and compute costs or those that charge for cold storage at a discounted price.
Your enterprise data warehousing solution should be able to ingest, consolidate and store your specific information, be it real-time or historical, structured or unstructured, coming in bulk or streams.
Tip: Make sure the software you are considering can integrate with the existing source systems via pre-built connectors or open APIs. In case you have plans to run analysis on huge volumes of unstructured data or enable streaming analytics, you would want to consider a data warehouse platform that can be seamlessly complemented with a data lake and an ODS.
You do not want to build an expensive data warehousing solution that nobody uses because of poor user experience. The two main reasons lying behind an enterprise data warehouse’s slow performance are the storage and computational bottlenecks.
Therefore, make sure that the system you are planning to use can promptly upscale to accommodate the spikes. To choose the technology that suits your use cases (data preparation, data analysis, data reporting, etc.) best, you have to consider:
To ensure the enterprise data warehouse is available for users anytime they need but in case of system failures the data could still be restored within a reasonable time, your data warehousing solution should support such functionality as:
One of the challenges when dealing with the enterprise data warehousing solutions is making data accessible to end-users without sacrificing its security. To ensure this, your data warehousing platform should support:
A data warehouse is only viable when it can be integrated with analytics and BI services; otherwise, it is just an expensive archive. Many vendors offer considerable discounts for product packs (data warehouse services pre-integrated with BI and analytics services from the same vendor) to help you streamline the deployment process and save resources. However, to avoid vendor lock-ins and future-proof your solution, we recommend you choose EDW platforms with rich integration capabilities (open APIs, pre-built connectors, etc.) to make it as vendor-agnostic as possible.
We’ve already mentioned that if you are going for the on-premises data warehouse solution, be ready for heavy upfront investments for hardware acquisition, software licensing, and personnel. Besides that, you would also need to cover the ongoing hardware and software updates, physical space, power consumption, etc.
What concerns the cloud EDWs, most vendors offer on-demand plans (when you pay for the resources you used) and pre-purchase plans (when you reserve storage and compute resources ahead) to choose from. To choose the optimal platform configuration and an optimal pricing option, you have to define:
Tip: Cloud EDW cost efficiency may be achieved with decoupled storage and compute resources, data compressions, flexible indexing, automatic workload management with pause and resume capabilities, materialized view support, and query results caching among others.
When dealing with on-premises solutions, maintenance can be a huge pain point. You have to ensure there is always an IT team in place to make quick adjustments to the data warehouse without any disruptions to your day-to-day business operations.
In the case of cloud solutions, you may find data warehousing services with a high level of self-optimization, but some companies prefer to still maintain the data warehouse manually for the sake of greater control and flexibility.
If you have decided on cloud deployment, keep in mind that cloud vendors do everything possible to make you stick with them no matter what. They may charge you extra for moving data out of their cloud, integration with third party solutions, running multi-cloud, and whatnot. So if you are looking for a truly omnicloud experience or want to go for hybrid data warehousing, the technology choice may be rather limited.
If you want to get more value out of your data, consider building an enterprise data warehouse to support and streamline your data-driven journey. There has never been a better time, with the volume of information is growing, computing powers are increasing, and the technology is already here, getting more advanced and affordable.
Delegate data management to Itransition and turn your data into a unified, clean and secure source of value. Book your consultation now.
Leverage Itransition's big data services and expertise to extract valuable insights from data and turn them into a competitive advantage.
We overview the process of building a data warehouse (DWH), including architectural approaches, key steps, talents required, software and best practices.
Check out key components of the data fabric architecture and learn how the data fabric approach helps ensure data compatibility between heterogeneous sources.
Learn why cloud business intelligence has become an imperative for enterprise success and how businesses can choose the right cloud BI tool for their needs.
Understand the balance between gut feel and data in business through Itransition’s data driven decision making examples.
Learn how Itransition delivered an integration environment that utilize multiple communication protocols in a distributed infrastructure.