Enterprise data warehousing:
architecture, types, best tools, and selection

Enterprise data warehousing: architecture, types, best tools, and selection

June 20, 2023

Enterprise data warehousing market overview

the projected enterprise data warehouse market size by 2030

Market Research Future

Scheme title: The most important data warehousing-related trends
Data source: yellowbrick.com — Key Trends in Hybrid, Multicloud and Distributed Cloud for 2021

What are the most important data warehousing-related trends for your company this year? Select all that apply.

Scheme title: Companies’ investments in analytics infrastructure
Data source: yellowbrick.com — Key Trends in Hybrid, Multicloud and Distributed Cloud for 2021

In the current climate of uncertainty, our investment in analytics infrastructure such as data platforms, data warehouses, etc. is:

6 components of an enterprise data warehouse

An enterprise data warehouse is more than a repository connected to your data sources (CRM, IoT devices, SaaS apps, etc.) on one end and to BI and analytics software on the other. In truth, an enterprise data warehousing solution is a comprehensive data processing and storage environment that consists of the following components:

1 ETL/ELT

Extract, transform, load (ETL) or extract, load, and transform (ELT) tools ingest information from the source systems and process it until it’s suitable for permanent storage. It’s one of the core elements for enterprise-grade analytics because companies have to rely on numerous data sources with different data types, models, and information generation speeds.

2 Staging area

A staging area is a temporary raw data repository between data sources and its permanent storage that hosts the data during the transformation stage. This element is typical for solutions built with the ETL approach but can be omitted if the transformations are performed in the data warehouse database.

3 Data warehouse database

Traditionally, an enterprise data warehouse database is a relational database where the integrated and subject-oriented business information is loaded into data models for analytical querying. This component also includes a metadata repository where an enterprise stores a map of its data for easy access and handling.

4 Data marts

Dimensional data marts are built to satisfy the analytics needs of specific user groups and departments, for example, sales and marketing, production, supply chain management, and finance. Data marts facilitate easier and quicker data access and analysis as they handle smaller, more applicable datasets. 

5 OLAP cubes

Deploying multidimensional OLAP cubes that store data in the pre-aggregated form helps overcome the limitation of relational databases and streamline data analysis. The data in OLAP cubes can be sliced and diced, drilled down, rolled up, and pivoted to handle various analytics requests of business users.

6 Data governance

The data governance component defines processes and policies for managing data quality and security, data modeling, metadata, data retention and backup, data usage, and user activity.

Looking for a trustworthy DWH consultant?

Contact us

Enterprise data warehouse architecture

Traditional enterprise data warehouse solutions are built according to the three-tier architecture, which includes:

  • Data warehouse server (bottom tier)
  • OLAP server (middle tier)
  • Data access layer (top tier)

However, the architectural approach should still be dictated by the company’s needs, so other design methods (e.g. a one-tier or a two-tier architecture) can prove more suitable in some cases.

Operational databasesSaaS applicationsFilesSocial mediaWebsiteData sourcesExtract, transform, and loadStaging areaData integrationExtract, transform, and loadMetadataEDW databaseData martsBottom tierOLAP cubeOLAP cubeMiddle tierReportsDashboardsData miningTop tier

Enterprise data warehousing functionality

An enterprise data warehouse is not a specific software type but an environment combining multiple technologies. Together, they enable the following functionality: 

Connectivity

    • Pre-built connectors to various cloud and on-premises data sources, including databases, business applications, flat files, feeds, web URLs, IoT devices, and ecommerce platforms
    • API libraries for custom connectors creation
    • Integration with business intelligence and analytics software, including big data analytics and ML tools
    • Integration with an operational data store and a data lake

    Data preparation

      • Processing of structured, semi-structured, and unstructured data
      • Batch and streaming data processing
      • Data profiling
      • Automated data standardization, deduplication, removal, cleaning, and transformation with the ETL/ELT process
      • Metadata discovery, cleaning, and updating
      • Data modeling

      Data storage

        • Storing pre-processed business data in the data staging area
        • Storing integrated, subject-oriented, nonvolatile business data in a central database according to a predefined data model(s)
        • Storing data in a relational, columnar, or/and multidimensional format
        • Storing data in an enterprise-wide database and department-level data marts
        • Storing metadata in data catalogs, data dictionaries, and glossaries

        Data security and compliance management

          • Sensitive data discovery and labeling
          • End-to-end data encryption
          • Dynamic data masking
          • Fine-grained access control
          • Configurable data security levels (table, column, raw)
          • Management of compliance configurations (HIPAA, GDPR, PCI, SOC, FedRAMP)
          • User activity auditing
          • Automated data backup and customizable fault tolerance

          Enterprise data warehouse integrations

          To serve the needs of various users across the company, the enterprise data warehouse should integrate data from all sources defined by the established analytics objectives at the required granularity level. Among the most commonly integrated data sources are:

          CRM system
          External data sources
          CSV and flat files
          Project management software
          Corporate website and intranet
          Enterprise data warehouse
          Supply chain management software
          Ecommerce platform
          Accounting and finance software
          Marketing software
          ERP system

          Enterprise data warehouse types

          When setting up an enterprise data warehouse, businesses have to choose between a cloud, on-premises, or hybrid environment.

          On-premises

          Cloud

          Hybrid

          Description

          On-premises

          Cloud

          Hybrid

          An in-house or outsourced IT team on-premises deploys DWH on the local server

          A cloud data warehouse is hosted and managed in the cloud. All hardware-related costs, software setup, infrastructure audits, and maintenance are the provider’s responsibility (if a DWH is delivered as a managed server).

          A hybrid data warehouse is distributed across both cloud and on-premises environments

          Major pros

          On-premises

          Cloud

          Hybrid

          Comprehensive control over the data warehouse hardware and software infrastructure  High availability and security Compliance with data regulations, which require keeping data onsite

          Quick deployment and fast and cost-effective storage and computational resources scaling up and out Minimized upfront costs due to a pay-as-you-go model High fault tolerance and disaster recovery due to the distributed nature of the cloud data warehouses

          Efficient operation in the cloud while meeting the strictest regulatory requirements and addressing data latency issues

          Limitations

          On-premises

          Cloud

          Hybrid

          Heavy upfront investments for hardware acquisition, software licenses, IT resources, etc. Requires comprehensive capacity planning due to complicated scaling Requires an experienced IT team to keep the system running efficiently

          Failure to meet compliance requirements prohibiting cloud data storage   Lack of pricing transparency and complicated pricing structures (e.g., egress fees, extra pay for hot data storage, excess compute, geo-redundancy)

          High price due to purchasing hardware and software and paying for the cloud resources Requires solid expertise in development and maintenance

          Top tools for enterprise data warehouse solutions

          We recommend starting the data warehouse selection process by reviewing the solutions that are recognized leaders in the recent Forrester Wave and Gartner Magic Quadrant reports.

          Features
          • Direct querying of structured, semi-structured, and unstructured data from Amazon S3 for analysis without loading and transformation Seamless integration with the AWS analytics services and select AWS partners to ingest data from Salesforce, Google Analytics, Facebook Ads, Slack, Jira, Splunk, etc. Querying live data across Amazon Relational Database Service (RDS), Aurora PostgreSQL, RDS MySQL, and Aurora MySQL databases with the federated query capability Native support of semi-structured data  Native support of advanced analytics
          • Flexible separate payment for compute and storage resources with RA3-type nodes Dynamic concurrency scaling for extra compute power Continuous cluster health monitoring Manual and automated snapshots for disaster recovery Data access permissions applied to tables, multi-factor authentication, manually-enabled data encryption, dynamic data masking Compliance with HIPAA, ISO 27001, PCI DSS, SOC 1 Type II, and SOC 2 Type II
          Software category
          Cloud data warehousing
          Pricing
          On-demand pricing from $0.25/hour Amazon Redshift Serverless $0.36/RPU hour Managed storage pricing $0.024/GB/month Free trial

          Features
          • Pre-built connectors to 95+ data sources SQL-querying real-time operational data without loading and transformation with Azure Synapse Link Ingesting data from both on-premises and cloud source data stores with Azure Data Factory Native integration with Azure Data Factory, Azure Data Lake Storage, Azure Cosmos DB, Azure Machine Learning, Azure Cognitive Services, and Power BI Big data and streaming data ingestion with built-in Apache Spark and Azure Stream Analytics 
          • Separate billing and scaling of computing and storage resources  Manual and automatic workload management Built-in fault tolerance and disaster recovery Geo-backup capability Granular permissions on schemas, tables, views, individual columns, procedures, and other objects Data encryption and multi-factor user authentication Compliance with HIPAA, ISO 27001, PCI DSS, SOC1, SOC2, etc.
          Software category
          Enterprise analytics service
          Pricing
          Serverless: $5/per TB of processed data Dedicated: Pay-as-you-go: from $1.20/hour Data storage $23/TB/month Free trial

          Features
          • Native data integration with 150+ data sources via Cloud Fusion Support of multi-cloud analytics across clouds with BigQuery Omni Native integration with the Google Cloud Analytics ecosystem Real-time analytics with built-in streaming data ingestion with BigQuery Storage Write API or legacy streaming API and query acceleration Analytics querying of structured, semi-structured, and unstructured data
          • Native support for geospatial analytics Built-in ML capabilities Separate billing for storing cold and hot data Replicated storage in multiple locations charge-free by default Granular data access on datasets, tables, and views, multi-factor authentication, and data encryption by default Compliance with HIPAA, ISO 27001, PCI DSS, SOC 1 Type II, and SOC 2 Type II
          Software category
          Multi-cloud data warehouse
          Pricing
          Compute (on-demand) from $5 Compute (standard) $0.04/slot hour Storage from $0.01/GB Data ingestion (streaming inserts) $0.01/200 MB Data extraction (streaming reads) from $1.10/TB read Free tier

          Features
          • Enablement of analytics, BI and ML scenarios on operational stores in SQL Server with Azure Synapse Link for SQL SQL Server integration with S3-compatible object storage Data virtualization for querying different data types on different data sources Integration with Microsoft Purview for greater data discovery, automated metadata capture, and data classification  Built-in query intelligence for performance and scalability automatic enhancement Single view of all SQL Servers deployed on-premises, in Azure, and other clouds
          • Access control permissions and dynamic data masking On-premises data protection with Microsoft Defender for Cloud Tracking all changes made to the database over time with the Ledger for SQL Server feature enabled with blockchain Fully-managed disaster recovery in the cloud with Azure SQL Managed Instance Strengths Easy migration to the cloud with minimized downtime Pay-as-you-go billing with Azure Arc for all SQL Server environments
          Software category
          Microsoft flagship relational DBMS
          Pricing
          Open no-level estimated pricing: Enterprise - $15,123 Standard (per core) - $3,945 Standard (server) - $989 Standard (CAL) - $230 Pay-as-you-go (monthly rate per core): Standard - $73 Enterprise - $274 Free trial

          Features
          • Available on Amazon Web Services, Microsoft Azure, and Google Cloud Support for AWS PrivateLink, Azure Private Link, and Google Cloud Private Service Connect Analytics support through the Snowflake platform and Snowflake’s technology partners Native connectivity with a variety of data integration tools, including Hevo Data, Apache Kafka, and Informatica Cloud Native connectivity with multiple BI tools, including Power BI, Tableau, Looker, and AWS Quicksight
          • Automated database maintenance with built-in performance optimization, materialized view maintenance, automatic clustering, etc. Independent automatic scaling of computing and storage resources Secure data sharing across regions/clouds Always-on data encryption at rest and in transit and dynamic data masking Multi-factor authentication Database replication Compliance with HIPAA, ISO 27001, PCI DSS, SOC 1 Type II, and SOC 2 Type II
          Software category
          Cloud-based data warehouse
          Pricing
          On-demand pricing or pre-purchased storage capacity at a discount with computation time billed separately Pricing is available on request

          Need help choosing an optimal tech stack for your DWH project?

          Get in touch

          A guide to choosing an enterprise data warehouse platform

          As the field of data analytics matures, the diversity of data warehouse software technologies can become overwhelming. So when assessing enterprise data warehouse technologies, a company should consider multiple factors to select the right tech stack.

          Data volume

          To avoid overwhelming storage costs and enterprise data warehouse scaling inflexibility, consider your current data volumes as well as target data volume when choosing the platform. To make estimates, take into account annual information growth rates and your strategic data management and analytics objectives, like plans to support decision-making with new data sources, including big data.

          Data type

          An enterprise data warehousing solution should be able to ingest, consolidate, and store your specific information, be it real-time or historical, structured or unstructured, as well as coming in bulk or streams. Ensure the software you are considering can integrate with the existing source systems via pre-built connectors or open APIs. If you plan to run analysis on voluminous unstructured data or enable streaming analytics, you should consider a data warehouse platform that can be seamlessly complemented with a data lake and an ODS to query data without loading and transformation.

          Platform performance

          The two main reasons behind an enterprise data warehouse’s slow performance are storage and computational bottlenecks. Therefore, make sure that the system you are planning to implement can promptly scale up to accommodate the spikes. You also have to choose the technology that best suits your use cases (data preparation, data analysis, data reporting, etc.) and take into account future data warehouse users, their number, solution frequency usage, and query concurrency.

          Platform availability

          You have to ensure the enterprise data warehouse is available for users anytime, and in case of system failures, the data could still be restored within a reasonable time. Thus, your data warehousing solution should support automatic data backup, data restoration capabilities (for example, from a snapshot taken a day earlier), geo-redundancy, and continuous system health monitoring.

          Security

          One of the challenges when dealing with enterprise data warehousing solutions is making data accessible to end-users without weakening its security. To ensure this, your data warehousing platform should support granting data access to specific users or user groups on row and column levels, multi-factor user authentication, safeguarding sensitive information with encryption, and dynamic data masking for hiding data elements from certain users and groups.

          Integration capabilities

          A data warehouse is only useful when it can be integrated with analytics and BI services; otherwise, it is just an expensive archive. Many vendors offer considerable discounts for product packs (data warehouse services pre-integrated with BI and analytics services from the same vendor) to help streamline the deployment process and save resources. However, to future-proof your solution and avoid vendor lock-ins, we recommend choosing enterprise data warehouse platforms with rich integration capabilities (open APIs, pre-built connectors, partner platforms, etc.) to make it vendor-agnostic.

          Cost

          If you opt for an on-premises data warehouse solution, be ready for heavy upfront investments in hardware acquisition, software licensing, and personnel. Besides, you would also need to cover the ongoing hardware and software updates, physical space, power consumption, etc. In regard to cloud enterprise data warehouses, most vendors offer on-demand plans (when you pay for the resources you used) and pre-purchase plans (when you reserve storage and computing resources ahead) to choose from. To choose an optimal platform configuration and pricing option, you have to define current and target data volumes, the number of users and their goals, data transformation complexity, etc.

          Solution maintenance

          Maintenance can be a huge pain point with on-premises solutions, so you need to ensure there is always an IT team in place to make quick adjustments without disrupting your day-to-day business operations. When it comes to cloud solutions, you may find data warehousing services with a high level of self-optimization, but most companies still prefer to maintain the data warehouse manually for better control and flexibility.

          Enterprise data warehouse cost and timeline

          Enterprise data warehouse costs include various categories of implementation and maintenance costs, such as:

          Data volume, its type, complexity, and quality

          The number of data sources and their disparity

          The number of data flows and data modeling complexity

          Type of workload to support

          Data cleansing and transformation complexity

          Data security and compliance requirements

          Platform scalability, fault tolerance, and velocity

          Real-time reporting requirements

          Software license fees

          Solution management

          Key factors

          Enterprise data warehouse benefits

          Corporate information consolidation

          Integrating enterprise-wide information typically scattered across multiple systems, companies can perform analysis of cross-functional historical data and carry out business performance assessment, risk analytics, or strategic planning.

          Separation of operational and analytics workloads

          Adopting an enterprise data warehouse, companies eliminate running analytics queries against OLTP databases, which are extremely slow and can result in system failures, and improving analytics speed and accuracy. 

          Centralized data governance and management

          Setting up a unified approach to data governance and management alongside the enterprise data warehouse implementation, companies prevent data inconsistencies and redundancy, varying data quality, data access constraints, and compromised analytics results.

          Automated data management

          An enterprise data warehouse helps companies eliminate manual resource-consuming and error-prone data extraction, cleansing, and transformation, all  while streamlining data management workflows and cutting operational costs.

          Facilitated self-service BI

          An enterprise data warehouse allows users to set up self-guiding data management and analytics and free up data teams from routine analytics and reporting activities as well as help business users easily obtain the insights they need. 

          Advanced analytics facilitation

          An enterprise data warehouse serves as a solid and well-governed foundation for new analytics initiatives – big data analytics, predictive analytics, self-service BI, ML, and AI.

          Do not delay your EDW transformation

          Do not delay your EDW transformation

          As companies across industries try hard to embed data-driven insights into every business decision, customer interaction, and business process, data warehousing is gaining traction as a key enabler. There has never been a better time to get more value out of data, with the growing information volume, increasing computing powers, and more advanced and affordable technology. Since implementing an enterprise data warehousing solution requires solid expertise, consider bringing a trustworthy DWH consultant to your project. At Itransition, we offer a full range of DWH services to help companies design and introduce a scalable and future-proof data warehouse and use it to accelerate decision-making and gain a competitive advantage.

          Let us guide you through EDW adoption

          Turn to Itransition

          Enterprise data warehousing FAQs

          Enterprise data warehouse vs data warehouse

          The major difference lies in the data volume stored within the data warehouse and the complexity of logic behind it, inсluding the number of data models. Generally, an enterprise data warehouse houses cross-functional business data and serves the needs of all business departments, whereas a traditional data warehouse serves the needs of a particular department/departments (similar to data marts). However, it’s worth mentioning that these terms can also be used interchangeably.

          Enterprise data warehouse vs data mart

          Data marts (sometimes referred to as ‘traditional’ or ‘usual’ data warehouses) are actually subsets of an enterprise data warehouse. They have the same functionality as enterprise data warehouses – collecting data from different sources and making it available for analysis. However, data marts have a narrower scope as they are designed to meet the needs of particular departments or lines of business. This means that they collect only the data that its users (sales, marketing, HR, etc.) need. All in all, data marts are usually created for the sake of speed, since their requests are more specific and their queries run against a smaller amount of information.

          Enterprise data warehouses vs operational data stores (ODS)

          Operational data stores contain up-to-date information in its original format consolidated from disparate operational sources and enable real-time reporting. ODS are not substitutes for a data warehouse but rather complement the whole enterprise data warehousing environment. They also have some significant differences from EDW:

          • Data timeliness
            Enterprise data warehouses store both historical and current data, while ODS storage capabilities are limited to maintaining the most recent records, making it a go-to tactical tool.
          • Data aggregation
            Enterprise data warehouses host information after it undergoes particular transformations (cleansing, enrichment, reformatting, etc.), while ODS keeps information in its raw state, which ensures superior speed of analysis and reporting.
          • Query complexity
            Enterprise data warehouses are built for running complex analytics queries on huge volumes of information, while ODS is used for relatively simpler queries run on real-time data – for example, to quickly identify the reason behind a failed transaction.

          Enterprise data warehouse vs data lake

          A data lake is a centralized repository that maintains all types of data in its raw or pre-processed format, while enterprise data warehouses store only highly-structured data according to predefined data models. Because of it, data lakes can store a near-infinite volume of data at a relatively low price until this information is used. The use cases of these two repositories also differ - while the enterprise data warehouse serves as a central BI component, the data in the data lake is aimed at data scientists and engineers, who use it to train ML and run predictive and big data analytics. 

          Enterprise data warehouse vs lakehouse

          A data lakehouse is a hybrid storage solution that houses all data types and has strong metadata management capabilities, which allows it to cover both BI use cases (as any DWH) and big data analytics and ML workloads (as a data lake). Data lakehouses are more cost-effective than DWHs, as the data is maintained in cheaper repositories, which makes them a preferable option in many scenarios and for various data volumes.

          What is a virtual data warehouse?

          A virtual data warehouse is an alternative to a traditional enterprise data warehouse, which implies creating a virtual layer on top of multiple databases where the data is stored, thus no data is moved physically. A virtual data warehouse is a good option for companies that store their data in a standardized form, which doesn't require complex transformations.    

          Data management services

          Service

          Data management services

          Delegate data management to Itransition and turn your data into a unified, clean and secure source of value. Book your consultation now.

          Big data services

          Service

          Big data services

          Leverage Itransition's big data services and expertise to extract valuable insights from data and turn them into a competitive advantage.

          Building a data warehouse: a step-by-step guide

          Insights

          Building a data warehouse: a step-by-step guide

          We overview the process of building a data warehouse (DWH), including architectural approaches, key steps, talents required, software and best practices.

          Data fabric architecture: building blocks, use cases, and benefits

          Insights

          Data fabric architecture: building blocks, use cases, and benefits

          Check out key components of the data fabric architecture and learn how the data fabric approach helps ensure data compatibility between heterogeneous sources.

          Cloud business intelligence: the whys and hows

          Insights

          Cloud business intelligence: the whys and hows

          Learn why cloud business intelligence has become an imperative for enterprise success and how businesses can choose the right cloud BI tool for their needs.

          Enterprise business intelligence: capabilities, key components, and best solutions

          Service

          Enterprise business intelligence: capabilities, key components, and best solutions

          Learn about enterprise business intelligence solutions and their key features, architectural components, core integrations, technology options, and more.