Data Warehousing and Dimensional Modeling

Data Warehousing

Informational Database vs. Transactional Database

Warehouse: An informational Database

  • A single repository of organisational data

  • Integrates data from multiple sources

  • Makes data available to managers/users

  • Support analysis and decision-making

Involve a large data store.

Data warehouse supports analytical queries.

  • Numerical aggregations: How many? Average? Total cost?

  • Understanding dimensions: Sales by state by customer type

Characteristics of a Data warehouse

  • Subject oriented Data warehouses are organized around particular subjects (sales, customers, products)

  • Validated, Intergrated data Data from different systems converted to a common format: allowing comparison and consolidation of data from different sources. Data from various sources validated before storing it in a data warehouse

  • Time variant Historical data

  • Non-volatile Users have read access only


A DW Architecture

Dimensional Modeling

Dimensional Analysis: To support business analysis view:

Revenue (Fact) per product (Dimension) per customer (Dimension) per location (Dimension)

A dimensional model consists of:

  • Fact table

  • Serveral dimensional tables

  • Hierarchies in the dimensions

Essentially a simple and restricted type of ER model

A fact table contains the actual business measures(additive, aggregates), called facts. The fact table also contains foreign keys pointing to dimensions.

The actual data for the fact table may look like this:

  • Measures: Dollar sales and Unit Sales

  • Granularity, or level of detail

    Finest level of detail for a fact table is determined by the finest level of each dimension.

    For example, suppose I want to know the hourly rainfall, daily rainfall and monthly rainfall, hourly rainfall is the finest level of detail.

Fact table is an intersection table.

Steps

  • Choose a Business Process

  • Choose the measured facts

  • Choose the granularity of the fact table

  • Choose the dimensions

  • Complete the dimension tables

Normalized or Denomarlized

Normalization:

  • Eliminates redundancy

  • Storage effiency

  • Referential integrity

Denormalization

  • Fewer tables (fewer joins)

  • Fast querying

  • Design is tuned for end-user analysis

Last updated