Imp

  • November 2019
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Imp as PDF for free.

More details

  • Words: 1,289
  • Pages: 5
Lecture:8 Lesson:8 Topic :STAR Schemas and Implementation Techniques for OLAP

STAR Schemas The star schema is a data-modeling technique used to map multidimensional decision support data into a relational database. Star schemas yield an easily implemented model for multidimensional data analysis while still preserving the relational structures on which the operational database is built. The basic star schema has four components: facts, dimensions, attributes, and attribute hierarchies. 1. FACTS

Product Model Type Color

Channel

Region Nation Facts Product Region Time Channel Revenue Expenses Units

District Dealer

Time Week Year

Facts are numeric measurements(values) that represent a specific business aspect or activity. Facts commonly used in business data analysis are units, costs, prices, and revenues. Facts are stored in a fact table that is the center of the star schema. The fact table contains fact that are linked through their dimensions. Facts can be computed or derived at runtime. Such computed or derived facts are sometimes called metrics in order to differentiate them from stored facts. The fact table is updated periodically( daily, weekly, monthly, etc,) with data from operational databases. 2. DIMENSIONS

dimensions are qualifying characteristics that provide additional perspectives to a given fact. The kind of problem typically addressed by DSS might be: “Make a comparison of the sales of unit X by region, for the first quarters of 1990 through 1999.” In this example, sales have product, location, and time dimensions. Such dimensions are normally stored in dimension tables. 3. ATTRIBUTES • Each dimension table contains attributes. Attributes are often used to search, filter, or classify facts. Dimensions provide descriptive characteristics about the facts through their attributes. Therefore, the data warehouse designer must define common business attributes that will be used by the data analyst to narrow a search, group information, or describe dimensions. • A ROLAP engine stores data in an RDBMS and uses its own data analysis logic and the end user GUI to perform multidimensional analysis. • A MOLAP system stores data in an MDBMS, using proprietary matrix and array technology to stimulate this multidimensional cube. • Using multidimensional jargon, the ability to cut slices off the cube to perform a more detailed analysis is known as slice and dice. 4. ATTRIBUTE HIERARCHIES • Attributes within dimensions can be ordered in a well-defined attribute hierarchy. • The attribute hierarchy provides a top-down data organization that is used for two main purposes: aggregation and drill-down/roll-up data analysis. A location attribute hierarchy Region Store

City Store • • • •

The attribute hierarchy provides the DWH with the capability to perform drilldown and roll-up searches. The attributes from different dimensions can be grouped to form a hierarchy. The attribute hierarchies determine how the data in the DWH are extracted and presented. The attribute hierarchy information is stored in the DBMS’s data dictionary and is used by the OLAP tool to properly access the DWH. Once such access is ensured, query tools must be closely integrated with the data warehouses’s metadata, and they must support powerful analytical capabilities.

5. STAR SCHEMA REPRESENTATION • Facts and dimensions are normally represented by physical tables in the DWH database. • The fact table is related to each dimension table in a many-to-one (M:1) relationship. • Many fact rows are related to each dimension row. • Facts and dimension tables are related by foreign keys and are subject to the familiar primary/foreign key constraints. • Fact table is related to many dimension tables, the primary key of the fact table is a composite primary key. • The fact tables contain the actual values used in the decision support process, these values are repeated many times in the fact tables. • Each dimension is related to thousands of fact records. Implementation Techniques for OLAP Traditional SQL systems are evolving to support OLAP-style queries more efficiently, supporting constructs ( CUBE and window functions) and incorporating implementation techniques previously found only in specialized OLAP systems. Bitmap indexes Consider a table. Customers(custid: integer, name: string, gender: Boolean, rating: integer) Rating values : 1 to 5 Gender values: M, F • Columns with few values are called sparse. • We can exploit sparsity to construct a new kind of index that greatly speeds up queries on these columns. • To record values for sparse columns as a sequence of bits, one for each possible value. • The rows in the table are a collection of bit vectors. • The collection of bit vectors for a column is called a bitmap index for that column. Customers table Custid Name Gender Rating 111 Akila F 3 112 Mira F 4 113 Siva M 5 114 Kumaran M 5 Bitmap indexes M F 0 1 0 1

1 1

0 0

1 2 3 4 5 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 1 Advantages: • They allow the use of efficient bit operations to answer queries. Consider the query: “How many male customers have a rating of 5?”. We can take the first it vector for gender and do a bitwise AND with the fifth vector for rating to obtain a bit vector that has 1 for every male customer with rating 5. We can then count the number of 1s in this bit vector to answer the query. • Bitmap indexes can be much more compact than a traditional B+ tree index and are very amenable to the use of compression techniques. We can combine bit vectors with B+ tree indexes • The index levels allow us to quickly find the list of rids, in a standard list or bit vector representations, for a given value. • The index is more compact. We also have the benefits of fast bit vector processing. • The bit vector representation of an rid list relies on a mapping from a position in the vector to an rid. Join indexes • To create an index designed to speed up specific join queries. • Consider the star schema. One fact table F with two dimension tables D1 and D2. We store a tuple in the join index if r1 is the rid of a tuple in table D1 with value c1 in column C1, r2 is the rid of a tuple in table D2 with value c2 in column C2, and r is the rid of a tuple in the fact table F, and these three tuples join with each other. • Drawback: The number of indexes can grow rapidly if several columns in each dimension table are involved in selections and joins with the fact table. We can avoid this problem. • We now join F with D1 to extend the fields of F with the fields of D1, and index F on the ‘virtual field’ C1: If a tuple of D1 with value c1 in column C1 joins with a tuple of F with rid r, we add a tuple to the join index. • We create one such join index for each column of either D1 or D2 that involves a selection in some join with F. • If we make the new indexes bitmap indexes, the result is called a bitmapped join index. File organizations • Store the relation row-wise, but also store column separately.

• • •

A more radical file organization is to regard the fact table as a large multidimensional array and store it and index it as such. This approach is taken in MOLAP systems. Since the array is much larger than available main memory, it is broken up into contiguous chunks. In addition, traditional B+ tree indexes are created to enable quick retrieval of chunks that contain tuples with values in a given range for one or more dimensions.

Related Documents

Imp
May 2020 34
Imp
October 2019 49
Imp
May 2020 31
Imp
November 2019 21
Imp
May 2020 9
Imp
November 2019 23