Star Schema

  • November 2019
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Star Schema as PDF for free.

More details

  • Words: 2,343
  • Pages: 45




Review



Architecture



Characteristics Relational

ROLAP



OLAP

Multidimensional

Today Star Schema



OLAP 

VS. MOLAP 

Implementation techniques for OLAP   

10/15/08

Sudarshan

Fact table Dimensions Drilling Down & Roll up Slicing & Dicing

Bit map indexes Join indexes File org. 1











What is Star Schema?

Star Schema is a relational database schema for representing multidimensional data. It is the simplest form of data warehouse schema that contains one or more dimensions and fact tables. It is called a star schema because the entityrelationship diagram between dimensions and fact tables resembles a star where one fact table is connected to multiple dimensions. The center of the star schema consists of a large fact table and it points towards the dimension tables. The advantage of star schema are slicing down, performance increase and easy understanding of2 10/15/08 Sudarshan

Steps in designing Star Schema   





Identify a business process for analysis(like sales). Identify measures or facts (sales dollar). Identify dimensions for facts(product dimension, location dimension, time dimension, organization dimension). List the columns that describe each dimension.(region name, branch name, region name). Determine the lowest level of summary in a fact table(sales dollar). 10/15/08

Sudarshan

3

Important aspects of Star Schema & Snow Flake Schema 









In a star schema every dimension will have a primary key. In a star schema, a dimension table will not have any parent table. Whereas in a snow flake schema, a dimension table will have one or more parent tables. Hierarchies for the dimensions are stored in the dimensional table itself in star schema. Whereas hierarchies are broken into separate tables in snow flake schema. These hierarchies helps to drill down the data from topmost hierarchies to the lowermost hierarchies. 4 10/15/08 Sudarshan

Fact 







Facts are numeric measurements (values) that represent a specific business activity. Example, sales figures are numeric measurements that represent product and/or service sales. Facts are used in business data analysis, are units, cost, prices and revenues. Facts are stored in a FACT table I.e. the center of the star schema. 10/15/08

Sudarshan

5

Fact Table

  



The centralized table in a star schema is called as FACT table, that contains facts and connected to dimensions. A fact table typically has two types of columns: those that contain facts and those that are foreign keys to dimension tables. The primary key of a fact table is usually a composite key that is made up of all of its foreign keys. A fact table might contain either detail level facts or facts that have been aggregated (fact tables that contain aggregated facts are often instead called summary tables). A fact table usually contains facts with the same level of aggregation. 10/15/08

Sudarshan

6



Many OLAP applications are based on a fact table



For example, a supermarket application might be based on a table Sales (Market_Id, Product_Id, Time_Id, Sales_Amt)



The table can be viewed as multidimensional 



Market_Id, Product_Id, Time_Id are the dimensions that represent specific supermarkets, products, and time intervals Sales_Amt is a function of the other three 10/15/08

Sudarshan

7

Fact Table (Conclusion) 

Central table   



10/15/08

mostly raw numeric items narrow rows, a few columns at most large number of rows (millions to a billion) Access via dimensions

Sudarshan

8

Dimension 







Qualifying characteristics that provide additional perspective to a given fact. Example, sales might be compared by product from region to region and from one time period to the next. Here sales have product, location and time dimensions. Such dimensions are stored in DIMENSIONAL TABLE. 10/15/08

Sudarshan

9

Dimension Tables 





The dimensions of the fact table are further described with dimension tables Fact table: Sales (Market_id, Product_Id, Time_Id, Sales_Amt) Dimension Tables: Market (Market_Id, City, State, Region) Product (Product_Id, Name, Category, Price) Time (Time_Id, Week, Month, Quarter)

10/15/08

Sudarshan

10

Attributes 

 

Each dimension table contain attributes. Used to search, filter and classify facts. Example, Sales, we can identify some attributes for each dimension: 

 

Product Dimension: product ID, description, product type Location Dimension: region, state, city. Time Dimension: year quarter, month, week and date.

10/15/08

Sudarshan

11

Attributes hierarchy   

 

AH provides a top-down data organization Used for aggregation and drilldown/roll-up data analysis. Example, location dimension attributes can be organized in a hierarchy by region, state and city. AH provides the capability to perform drill-down and roll-up searches. Allows the DW and OLAP systems to to have defined path. 10/15/08

Sudarshan

12

(location) all

all Europe

region country city

Germany

Frankfurt

...

...

...

Spain

10/15/08

Canada

Vancouver ... L. Chan

office

North_America

Sudarshan

...

...

Mexico

Toronto

M. Wind 13

Multidimensional Data Sales volume as a function of product, month, and region Dimensions: Product, Location, Time Hierarchical summarization paths

gi o

n



Re

Industry Region

Year

Product

Category Country Quarter Product

City Office

Month

Week

Day

Month 10/15/08

Sudarshan

14

2Qtr

3Qtr

4Qtr

sum

U.S.A

Pr o

du

TV PC VCR sum

1Qtr

Date

Total annual sales of TV in U.S.A.

Canada Mexico

Country

ct

A Sample Data Cube

sum

10/15/08

Sudarshan

15

Star Schema A single fact table and for each dimension one dimension table  Does not capture hierarchies directly 

T i m

e c u s t 10/15/08

date, custno, prodno, cityname, ...

f a c t

Sudarshan

p r o d c i t y

16



Example of Star Schema: Figure 1.6

10/15/08

Sudarshan

17



 

In the example, sales fact table is connected to dimensions location, product, time and organization. It shows that data can be sliced across all dimensions and again it is possible for the data to be aggregated across multiple dimensions. "Sales dollar" in sales fact table can be calculated across all dimensions independently or in a combined manner which is explained below. Sales dollar value for a particular product Sales dollar value for a product in a location Sales dollar value for a product in a year within a 10/15/08

Sudarshan

18

Example of Star Schema time

item

time_key day day_of_the_week month quarter year

Sales Fact Table time_key item_key branch_key

branch branch_key branch_name branch_type

location_key units_sold dollars_sold avg_sales

item_key item_name brand type supplier_type

location location_key street city province_or_street country

Measures 10/15/08

Sudarshan

19

Aggregation 

Many OLAP queries involve aggregation of the data in the fact table



For example, to find the total sales (over time) of each product in each market, we might use SELECT S.Market_Id, S.Product_Id, SUM (S.Sales_Amt) FROM Sales S GROUP BY S.Market_Id, S.Product_Id



The aggregation is over the entire time dimension and thus produces a twodimensional view of the data 10/15/08

Sudarshan

20

Time 

The output of the previous query Market_Id M2 M3

Product_Id

M1 SUM(Sales_Amt) P1 P2 P3 P4 P5

10/15/08

3003

1503

6003

2402

4503

3

7503

7000





Sudarshan

M4

… … … … …

21

Typical OLAP Operations 

Roll up (drill-up): summarize data 



Drill down (roll down): reverse of roll-up 



project and select

Pivot (rotate): 



from higher level summary to lower level summary or detailed data, or introducing new dimensions

Slice and dice: 



by climbing up hierarchy or by dimension reduction

reorient the cube, visualization, 3D to series of 2D planes.

Other operations 

drill across: involving (across) more than one fact table



drill through: through the bottom level of the cube to its back-end relational tables (using SQL)

10/15/08

Sudarshan

22

Drilling Down and Rolling Up 

Some dimension tables form an aggregation hierarchy Market_Id →



Executing a series of queries that moves down a hierarchy (e.g., from aggregation over regions to that over states) is called drilling down 



City → State → Region

Requires the use of the fact table or information more specific than the requested aggregation (e.g., cities)

Executing a series of queries that moves up the hierarchy (e.g., from states to regions) is called 23 10/15/08rolling up Sudarshan

Drilling Down 

Drilling down on market: from Region to State Sales (Market_Id, Product_Id, Time_Id, Sales_Amt) Market (Market_Id, City, State, Region)



SELECT

S.Product_Id, M.Region, SUM

(S.Sales_Amt) FROM Sales S, Market M WHERE M.Market_Id = S.Market_Id GROUP BY S.Product_Id, M.Region –

SELECT S.Product_Id, M.State, SUM (S.Sales_Amt) FROM Sales S, Market M

10/15/08

Sudarshan

24

Rolling Up 

Rolling up on market, from State to Region 

If we have already created a table, State_Sales, State_Sales using

1.

SELECT S.Product_Id, M.State, SUM (S.Sales_Amt) FROM Sales S, Market M WHERE M.Market_Id = S.Market_Id GROUP BY S.Product_Id, M.State then we can roll up from there to:

2.

SELECT T.Product_Id, M.Region, SUM (T.Sales_Amt) FROM State_Sales T, Market M WHERE M.State = T.State GROUP BY T.Product_Id, M.Region

10/15/08

Sudarshan

25

Roll-up and Drill Down Higher Level of Aggregation

Roll Up

    

Drill-Down



Sales Channel Region Country State Location Address Sales RepresentativeLow-level Details

10/15/08

Sudarshan

26

“Slicing and Dicing” The Telecomm Slice

Product

Household Telecomm Video Audio

i g e

R

s n o Europe

Far East India Retail Direct Special

10/15/08

Sales Channel Sudarshan

27

Snowflake Schema 

A snowflake schema is a term that describes a star schema structure normalized through the use of outrigger tables. i.e dimension table hierarchies are broken into simpler tables. In star schema example we had 4 dimensions like location, product, time, organization and a fact table (sales)

10/15/08

Sudarshan

28

Snowflake schema Represent dimensional hierarchy directly by normalizing tables. Easy to maintain and saves storage





T i m

e c u s t 10/15/08

p r o d

date, custno, prodno, cityname, ...

f a c t

Sudarshan

c i t y

r e g i o n29

Example of Snowflake Schema 

10/15/08

Sudarshan

30

Schema time time_key day day_of_the_week month quarter year

item Sales Fact Table time_key item_key branch_key

branch

location_key

branch_key branch_name branch_type

units_sold dollars_sold avg_sales

Measures 10/15/08

Sudarshan

item_key item_name brand type supplier_key

supplier

supplier_key supplier_type

location location_key street city_key

city

city_key city province_or_street country

31

Indexing Techniques 

  

Exploiting indexes to reduce scanning of data is of crucial importance Bitmap Indexes Join Indexes Other Issues  

10/15/08

Text indexing Parallelizing and sequencing of index builds and incremental updates Sudarshan

32

Indexing Techniques Bitmap index:



Index on a particular column



Each value in the column has a bit vector: bit-op is fast 

The length of the bit vector: # of records in the base table 

The i-th bit is set if the i-th row of the base table has the value for the indexed column 

not suitable for high cardinality domains 

10/15/08

Sudarshan

33

BitMap Indexes 

Example: the attribute sex has values M and F. A table of 100 million people needs 2 lists of 100 million bits

10/15/08

Sudarshan

34

Bitmap Index

M

Y

F

Y

F

N

M

N

F

Y

F

N

Customer

10/15/08

0

1

0

1

1

1

1

0

0

0

0

0

1

1

1

1

0

0

Query : select * from customer where gender = ‘F’ and vote = ‘Y’ Sudarshan

35

Bit Map Index

Base Table

Cust C1 C2 C3 C4 C5 C6 C7

Region Rating N H S M W L W H S L W L N H

Customers where

10/15/08

Region Index

Row ID N S E W 1 1 0 0 0 2 0 1 0 0 3 0 0 0 1 4 0 0 0 1 5 0 1 0 0 6 0 0 0 1 7 1 0 0 0 Region = W

Sudarshan

And

Rating Index

Row ID H M L 1 1 0 0 2 0 1 0 3 0 0 0 4 0 0 0 5 0 1 0 6 0 0 0 7 1 0 0 Rating = M

36

BitMap Indexes 

  



Comparison, join and aggregation operations are reduced to bit arithmetic with dramatic improvement in processing time Significant reduction in space and I/O (30:1) Adapted for higher cardinality domains as well. Compression (e.g., run-length encoding) exploited Products that support bitmaps: Model 204, TargetIndex (Redbrick), IQ (Sybase), Oracle 7.3

10/15/08

Sudarshan

37

Join Indexes Pre-computed joins  A join index between a fact table and a dimension table correlates a dimension tuple with the fact tuples that have the same value on the common dimensional attribute 

e.g., a join index on city dimension of calls fact table  correlates for each city the calls (in the calls table) from that city 

10/15/08

Sudarshan

38

Join Indexes 

Join indexes can also span multiple dimension tables 

10/15/08

e.g., a join index on city and time dimension of calls fact table

Sudarshan

39

Star Join Processing 

Use join indexes to join dimension and fact table

Calls C+T Time

C+T+L Location Plan

10/15/08

Sudarshan

C+T+L +P

40

Bitmapped Join Processing Bitmaps Time Location Plan

10/15/08

Calls

1 0 1

Calls

0 0 1

AND

1 1 0

Calls

Sudarshan

41

OLAP Is FASMI     

Fast Analysis Shared Multidimensional Information

Nigel Pendse, Richard Creath - The OLAP Report 10/15/08

Sudarshan

42

Warehouse Products



Computer Associates -- CA-Ingres Hewlett-Packard -- Allbase/SQL Informix -- Informix, Informix XPS Microsoft -- SQL Server Oracle -- Oracle7, Oracle Parallel Server Red Brick -- Red Brick Warehouse SAS Institute -- SAS Software AG -- ADABAS



Sybase -- SQL Server, IQ, MPP

      

10/15/08

Sudarshan

43

Products  

Oracle 8 Informix   



Online Dynamic Server XPS --Extended Parallel Server Universal Server for object relational applications

Sybase   

Adaptive Server 11.5 Sybase MPP Sybase IQ

10/15/08

Sudarshan

44

Warehouse Server Products   

Red Brick Warehouse Tandem Nonstop IBM   



DB2 MVS Universal Server DB2 400

Teradata

10/15/08

Sudarshan

45

Related Documents

Star Schema
May 2020 14
Star Schema
November 2019 6
Star
November 2019 41
Star
August 2019 49
Star
July 2020 22
Star
April 2020 35