Do we need Dimensional Data Warehouses in the Modern Data Stack?

Venu Chakrapani Nair
3 min readApr 14, 2022

I recently read an article that the star schema is dead (link) and that the dimensional model no longer has a relevance in the data and analytics world. I realize a lot has been written on the subject. I wanted to provide another perspective.

Per the author of the article, the need for the star schema data warehouse purely as a performance and capacity optimization construct in the data landscape is drifting away due to the immense availability of both these resources on the cloud. While I am a huge fan of GCP, Snowflake, Azure and AWS Cloud Data Platforms and disruption they have brought to this space, I still strongly believe that there are many reasons for considering dimensional data modeling(or another discipline like Data Vault) as a valuable exercise and a good design discipline in the data architecture and engineering process.

The traditional Data Warehouse Dimensional Model

Data Warehouses were originally defined as subject-oriented, integrated, time variant, non-volatile collection of data in support of management’s decision-making process, by the father of data warehousing, Bill Inmon in 1990.

While one of the main purposes of the data warehouse and the dimensional model (created by Ralph Kimball) was to offload a subset of the data from the operational systems, to generate management, executive reporting and dashboarding . One of the main motivations of doing this was to have a data model that is optimized for performance. The dimensional data model also serves the purpose of integrating the data coming from multiple source systems(internal and external to the organization). For example, customer data may have been coming from several customer master sources across an organization, but the conformed customer dimension serves the purpose of the single unified 360-degree view of the customer. The same goes with the fact tables, where there exists multiple OLTP systems for key processes(e.g. multiple billing systems). The single fact table(e.g. the billing fact) is the one stop shop for storage and analysis of this data.

The dimensional model is also a better way to communicate and demonstrate the analytical needs of the business. Having data modeled as a star schema model makes it incredibly easy to explain to the business stakeholders what the KPIs are, how they align with the business processes and strategy, and what specific dimensions can be used to slice and dice those KPIs.

The dimensional model also helps the report and visualization developer or analyst to spend more time creating a story with the data using powerful visualizations, and not have to worry about modeling it for ease of developing the visualizations.

Those are just some of the benefits I believe will be achieved by using a well modeled consumption data store.

The Cloud Data Platforms (CDPs) simplify the data engineering process

With cloud data platforms like Snowflake, Synapse and Google BigQuery, building pipelines to transform and load data into the well modeled data warehouse is greatly simplified. For example star schemas, can be applied on the read side of the equation versus having to physically build and store and duplicate the data into a star schema. In this approach, the data is landed into the cloud storage location of the preferred cloud service provider(CSP), like S3 bucket, Google storage or Azure Data Lake. Tools like dbt can be used to perform some basic transformations to ensure the data is clean, deduped and ready for consumption. The star schema can then be defined either as a separate model and tables in the Cloud Data Warehouse or as external tables on top of this data during read without having to physically move the data and duplicate it in the cloud data warehouse. The dimensional model can be defined on top of both structured and semi-structured data in this approach.

While, the need for creating monolithic data warehouse structures are giving way for federated data frameworks like the data mesh with the advent of the cloud data platforms. The dimensional modeling practice will continue to provide benefits in spinning up readable, easy to use, best practices based data architectures that can be used across the organization.

I would love to hear about your experiences, and thoughts about the pros and cons of this approach.

--

--