Data warehouse

Data warehouse overview

A data warehouse is a centralized place where data from many different sources can be stored.

An ETL model separates data in the warehouse based on whether they have already been extracted, transformed or loaded.

ELT-based data warehouse architecture

An ELT model first loads the data into the warehouse and transforms the data after it's been loaded.

Why it's useful

A data warehouse needs effort to be created alongside the source storage.

  • Allows to view data from multiple sources in one place
  • It doesn't slow down the source storage while running queries
  • Allows to check data history

As opposed to a datamart, a data warehouse stores data for the whole company, not just for a given topic in the company.

Difference between data warehouse and

Data Warehouse Media

data mart

Attribute Data warehouse Data mart
Scope of the data enterprise-wide department-wide
Number of subject areas multiple single
How difficult to build difficult easy
How much time takes to build more less
Amount of memory larger limited

Information storage

Facts

A fact is a value.

Dimensional versus normalized approach for storage of data

There are three or more leading approaches to storing data in a data warehouse – the most important approaches are the dimensional approach and the normalized approach.

Dimensional approach

A fact is concrete piece of data. Dimensions are the metadata about the fact.

Separation between data (facts) and contextual metadata (dimensions) makes the data warehouse simpler.

Some disadvantages of the dimensional approach are the following:

  1. When loading data from different systems some dimensions may be missing.
  2. If organization changes its business, both facts and dimensions may change . Changing to new structure requires either supporting reading from both old and new structure or converting old structure to the new structure.

Normalized approach

In the normalized approach, the data in the data warehouse are stored partially according to database normalization rules. Normalization removes data duplication by splitting each piece of data into entities. Each entity (e.g. customer, product, sale) relates to other entities. Reading a piece of data requires joining these entities together making reads slower.