Chap 3

May 2020
PDF

Download

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA

Overview

Download & View Chap 3 as PDF for free.

More details

Words: 4,182
Pages: 55

Preview
Full text

Architecture and Implementation of Database Systems HS 07 Dr. Jens Dittrich jens.dittrich@inf www.inf.ethz.ch/~jensdi

Institut of Information Systems

Indexing 1. One-dimensional Index Structures (Part 2)

Motivation  Example queries: 

What is the address of the student having ID 424342?



Which students attend less than 2 lectures in this semester?



Which students live in canton Zurich?



Which students do not live in canton Zurich?

 How does the DBMS compute the results to these queries? 

inspect all records (sequential access pattern)



organize data cleverly such that record may be found fast (index-based access patterns)

October 11, 2007

Dr. Jens-Peter Dittrich / Institute of Information Systems / jens.dittrich@inf

3

Primary versus Secondary Access Paths  Primary access path:

Access using an index that is based on the primary key of a relation

 Secondary access path:

Access using an index based on an attribute that is not the primary key

Examples:

Key

fname

lname

77

Frank

Meier

Index of... Key TID = primary access path fname TID = secondary access path lname TID = secondary access path

12

Simon

Schmidt

42

Hugo

Müller

11

Hans

Meier

25

Jens

Dittrich

76

Hugo

Schmidt

October 11, 2007

Dr. Jens-Peter Dittrich / Institute of Information Systems / jens.dittrich@inf

4

Secondary Access Paths and Inversion  Query using a secondary access path may return more than one result (1:n relationship among keys and results)

fname Frank Hans Hugo Jens Simon

October 11, 2007

TID

Key

fname

lname

1,0

77

Frank

Meier

1,1

12

Simon

Schmidt

1,2

42

Hugo

Müller

1,3

11

Hans

Meier

1,4

25

Jens

Dittrich

1,5

76

Hugo

Schmidt

Dr. Jens-Peter Dittrich / Institute of Information Systems / jens.dittrich@inf

5

Indexing in

(without Ranking)

 Assign a unique document ID (DocID) to every document (html, pdf, doc, etc.)

 Create a secondary access path indexing the mapping keyword --> {(DocID, {pos})}

{pos} = list of positions of occurrences inside a single document.

 This data structure is named inverted list or inverted file.  Example: ... jens

-> (42,{3,500,900,1000}), (88,{3,300}), (4025,{1,20,5000}), ditttrich -> (12,{2,450,600}), (78,{1,4300,7000}), (2123,{30}), ....

October 11, 2007

Dr. Jens-Peter Dittrich / Institute of Information Systems / jens.dittrich@inf

6

Keyword Search in

 How?  Lookup the list of entries for keyword “jens“  Return the first ten entries  Note: the first ten elements do not have to be the most important ones!

 What happens when we query for multiple keywords?

October 11, 2007

Dr. Jens-Peter Dittrich / Institute of Information Systems / jens.dittrich@inf

7

Google

Multiple Keywords (Without Ranking)

 Example:  Algorithm 

3 accesses on secondary access path using keys <eth>, <jens>,



Result: 3 sequences T1‘, T2‘, T3‘ of DocIDs (document IDs)



Compute intersection T‘ = T1‘∩{ t | t ∈ (T2‘ ∩ T3‘) ∧ pos(t,T2‘) = pos (t,T3‘)-1 }



Return first ten elements as result



Note: the first ten elements do not have to be the most important ones!



End.

October 11, 2007

Dr. Jens-Peter Dittrich / Institute of Information Systems / jens.dittrich@inf

8

Indexing in

(with Ranking)

 Problem: keyword search may return millions of results  For keyword “jens” Google estimates 43.8 million documents.  Furthermore: The most important documents do not have to be among the top 10 pages.

 Solution: try to order documents based on their (queryindependent)rank

 many algorithms, the most important one: Page Rank  more about page rank in the lecture “Information Retrieval“

 However: what is the impact of ranking on indexing?

October 11, 2007

Dr. Jens-Peter Dittrich / Institute of Information Systems / jens.dittrich@inf

9

Indexing in

(with Ranking)

 Core idea:

1. enumerate documents using docID = rank i.e., the document having docID=1 is the most important one Note: docIDs still have to be a key! (no duplicates) 2. Sort each result list by docID (i.e., its rank)

 Impact: sort-based intersection still works  Example: ...

Rank

jens

-> (7000,{3,500,900,1000}), (8888,{3,300}), (40251,{1,20,5000}), dittrich -> (12,{2,450,600}), (78,{1,4300,7000}), (2123,{30}), ....

October 11, 2007

Dr. Jens-Peter Dittrich / Institute of Information Systems / jens.dittrich@inf

10

Google

Multiple Keywords (With Ranking)

 Example:  Algorithm (corresponds to algorithm without ranking) 

3 accesses on secondary path using keys <eth>, <jens>,



Result: 3 sequences T1‘, T2‘, T3‘ of DocIDs (document IDs)



Compute intersection T‘ = T1‘∩{ t | t ∈ (T2‘ ∩ T3‘) ∧ pos(t,T2‘) = pos (t,T3‘)-1 }



Return first 10 docIDs of T’ as result.



Difference: the first 10 documents are the most important documents (w.r.t. page rank)

October 11, 2007

Dr. Jens-Peter Dittrich / Institute of Information Systems / jens.dittrich@inf

11

Requirements for Index Structures        

efficient usage of memory and disk low I/O-cost low CPU-cost low response times high throughout of operations easy to integrate into existing DBMSs easy to extend easy to maintain

October 11, 2007

Dr. Jens-Peter Dittrich / Institute of Information Systems / jens.dittrich@inf

12

Tree-Structured Indexes: Different Classes  binary trees 

not suitable for a DBMS



reason: hard to map nodes to pages

 digital trees 

important for non-relational data

 B-trees 

most important index structure for databases



Advantage: extremely versatile, easy to extend



has been researched for more than 30 years, still important extensions being proposed



many index structures with similar ideas exist: R-tree, M-tree

October 11, 2007

Dr. Jens-Peter Dittrich / Institute of Information Systems / jens.dittrich@inf

13

B-Trees: Agenda          

Basics (repetition) ISAM Clustered index Indirect vs. direct storage Primary versus secondary b-tree Bulk-loading Prefix B+-tree Prefix/suffix-compression Large index pages Cache conscious B+-trees

October 11, 2007

Dr. Jens-Peter Dittrich / Institute of Information Systems / jens.dittrich@inf

14

B-Tree  Pointer P0 points to subtree with keys strictly smaller than K1  For Pi (i=1,...,n-1) it holds: Ki < keys(Pi) < Ki+1  Pn points to a subtree having keys strictly greater than Kn node structure: P0

D1

K1

P1

K2

D2

P2

...

Kn

Dn

Pi: pointer

Pn

Ki: key Di: data root node

example: 7 3

1

4

43

43

11 41

4

2

5

7

9

6

null pointers October 11, 2007

Dr. Jens-Peter Dittrich / Institute of Information Systems / jens.dittrich@inf

12 3

14 5

B-Tree: Definition a B-tree of type (k, h) has the following invariants: 1. Every path from the root node to any leave has length h. 2. Every node (except root node and leaf level nodes) has at least k+1 children. The root is either a leaf or is a node and has at least two children. 3. Every node has at most 2k+1 children. node structure: P0

K1

D1

Pi: pointer P1

K2

D2

P2

...

Kn

Dn

Pn

k <= n <= 2k (if neither root onr leaf level)

Ki: key Di: data

Storage: B-tree nodes are mapped to pages/blocks. Literatur: R. Bayer, E. M. McCreight. Organization and Maintenance of Large Ordered Indexes. Acta Informatica, 1:4. 1972. 290-306. October 11, 2007

Dr. Jens-Peter Dittrich / Institute of Information Systems / jens.dittrich@inf

B+-Trees  Like B-trees, but: 

data entries are only stored in leaves



Impact: higher fan-out of nodes => height of tree decreases



leaves are connected to build a double-linked list: ISAM (Index Sequential Access Method)

Pi: pointer Ki: key Di: data

7 3

1

42

11

4

11 5

12

9

32

null pointer

October 11, 2007

12 33 14 77 null pointer

Dr. Jens-Peter Dittrich / Institute of Information Systems / jens.dittrich@inf

17

B+-Trees: Definition a B+-tree of type (k, k*, h) has the following invariants: 1. Every path from the root node to any leaf has length h. 2. Every node has at least k+1 children and at most 2k+1 children. 3. Every leaf has at least k* and at most 2k* entries. 4.The root is either a leaf or is a node having at least two children. Pi: pointer

node structure: P0

K1

P1

K2

P2

...

Kn

Pn

k <= n <= 2k

K0

D0

K1

D1

K2

...

Kl

Dl

N

k* <= l <= 2k* October 11, 2007

Di: data V: left sibling pointer

leaf structure: V

Ki: key

Dr. Jens-Peter Dittrich / Institute of Information Systems / jens.dittrich@inf

N: right sibling pointer

B+-Trees    

P0 points to a subtree having keys <= K1 (smaller or equal) For Pi (i=1,...,n-1) it holds: Ki < keys(Pi) <= Ki+1 Pn points to the subtree having keys strictly greater than Kn 2 different node types: nodes and leaves

node structure: P0

K1

P1

K2

P2

...

Kn

Pi: pointer

Pn

Ki: key Di: data

7 3

1

42

11

4

11 7

12

11

32

null pointer October 11, 2007

12 33 14 77 null pointer

Dr. Jens-Peter Dittrich / Institute of Information Systems / jens.dittrich@inf

B+-Trees Point Query (find_key)    

Recursive search starting with the root node Inside a node or leaf: binary search Exactly h-1 nodes and one leaf will be touched Example: find_key [4]

7 3

1

42

October 11, 2007

11

4

11 7

12

11

32

Dr. Jens-Peter Dittrich / Institute of Information Systems / jens.dittrich@inf

12 33 14 77

20

B+-Tree Range Query (find_range)  Example: find_range [4;12]  Algorithm: 1. point query with key 4 (i.e., find_key [4]) 2. Read leaves starting at position of key 4 sequentially until keys are strictly greater than 12 (i.e., ISAM: Index Sequential Access Method)

7 3

1

42

October 11, 2007

11

4

11 7

12

11

32

Dr. Jens-Peter Dittrich / Institute of Information Systems / jens.dittrich@inf

12 33 14 77

21

B+-Trees: split  If leaf > 2k* entries: 

Create a new leaf



Distribute entries to both leaves such that each leaf has >= k* entries

insert pointer and pivot into parent node Example Before split: 11 25 

leave L1

4

11 7

12

leaf L1

12 33 14 77

After insert [13,42] and split:

node N1

pivot node N1

11

12 33 13 42

13

25

14 77

new leaf

(split is similar for nodes) October 11, 2007

Dr. Jens-Peter Dittrich / Institute of Information Systems / jens.dittrich@inf

B+-Tree: insert (Procedural Programming)  insert (42,12): 1. node = root 2. while (node!=Leaf) node = choose_subtree( node, 42 ) 3. node.insert_entry(42,12) 4. If node has more than 2k* entries: Split leave into two leaves Distribute entries to both leaves insert new leaf pointer and pivot into parent node If parent node has more than 2k+1 children: Split parent node etc. (until root is reached or no split necessary anymore) ... October 11, 2007

Dr. Jens-Peter Dittrich / Institute of Information Systems / jens.dittrich@inf

23

B+-Tree: insert (Object-oriented 1/2)  Node.insert (42,12): 1. (split, left, pivot, right) = choose_subtree( 42 ).insert( 42,12 ) 2. If split occurred: AbstractNode

this.insert_node( pivot, right ) If this.entries > 2k+1: return this.split() 3. return (false, this, NULL, NULL)

Node

Leaf

 Leaf.insert (42,12): 1. insert_tuple( 42,12 ) 2. If this.entries > 2k*:

Call: root.insert(42, 12)

return this.split() Else return (false, this, NULL, NULL) October 11, 2007

Dr. Jens-Peter Dittrich / Institute of Information Systems / jens.dittrich@inf

24

B+-Tree: insert (Object-oriented 2/2)  Node.insert (42,12): 1. (split, left, pivot, right) = choose_subtree( 42 ).insert( 42,12 ) 2. If split: this.insert_node( pivot, right ) If this.entries > 2k+1:

usage: root.insert(42, 12)

return this.split() 3. return (false, this, NULL, NULL) Question: what happens if the root node has to be split?

October 11, 2007

Dr. Jens-Peter Dittrich / Institute of Information Systems / jens.dittrich@inf

25

B+-Tree: other insert-Strategies  If node full Try to redistribute entries to d predecessor and/or d successor nodes If redistribute successful: insert without split Else split

 Discussion: 

improves memory usage of the tree



Disadvantage: redistribution may be costly (adjust parent nodes?)



Requires double-linked list on nodes (similar to ISAM on leaves)

October 11, 2007

Dr. Jens-Peter Dittrich / Institute of Information Systems / jens.dittrich@inf

B+-Tree: other split-Strategies  If split occurs: 

Create new leaf (node)



Redistribute entries of m neighboring leaves (nodes) such that every leaf (node) has at least k* entries (k+1 children) m=1: normal split

Pi

Pi October 11, 2007

Pi+1

Pi+1

Pi+2

50%

50%

67%

67%

75%

75%

m=2

67%

75%

Dr. Jens-Peter Dittrich / Institute of Information Systems / jens.dittrich@inf

75%

m=3

B+-Bäume: delete  discussion analogue to split-operation!  If node/leaf underflows: merge  merge = inverse operation of split

i.e., put entries of m leaves (nodes) into a single leaf (node), all other leaves (nodes) are deleted

 Other strategy: 

relax B+-tree invariant and allow nodes/leaves to underflow without performing a merge



merge will only be performed if underflow continues for a certain amount of time



or: merge will never be performed (tolerate some dead space)



improves ISAM-access (less defragmentation of leaves)

October 11, 2007

Dr. Jens-Peter Dittrich / Institute of Information Systems / jens.dittrich@inf

28

ISAM and Block Fragmentation  Double-linked list on leaves allows efficient sequential access (ISAM) 7 3

1

42

11

4

11 7

12

11

32

12 33 14 77

 Problem: 

inserts and deletes will fragment leaves on disk (leaves not contiguous on disk anymore)



many inserts/updates => high fragmentation => increase of random I/O => decrease of ISAM performance

 How to fix: 

defragment tree regularly (analogue to operating system)



try to preallocate free blocks in block order on disk in regions where the tree might grow in future

October 11, 2007

Dr. Jens-Peter Dittrich / Institute of Information Systems / jens.dittrich@inf

29

B+-Tree  Python-demo

October 11, 2007

Dr. Jens-Peter Dittrich / Institute of Information Systems / jens.dittrich@inf

30

Indirect vs. Direct Storage  What is stored inside the leaves?  Either: pointers to records (TIDs), indirect storage TIDs

42,11

42

7,11

7

5,71

5

43,9

43

 Or: the actual records, direct storage

October 11, 2007

Dr. Jens-Peter Dittrich / Institute of Information Systems / jens.dittrich@inf

2,71

2

8,17

8

data pages

records

31

B+-trees and Inversion  Queries using a secondary access path may return more than one result (1:n relationship among keys and results) fname Frank Hans Hugo Jens Simon

TID

Key

fname

lname

1,0

77

Frank

Meier

1,1

12

Simon

Schmidt

1,2

42

Hugo

Müller

1,3

11

Hans

Meier

1,4

25

Jens

Dittrich

1,5

76

Hugo

Schmidt

 How do we store these lists of results in a B+-tree? Example: indirect storage in the leaf: list of TIDs Frank

October 11, 2007

1,0

Hans

1,3

Hugo

1,2

1,5

Dr. Jens-Peter Dittrich / Institute of Information Systems / jens.dittrich@inf

...

32

Clustered vs. Non-Clustered Index  Non-clustered index: data blocks do not necessarily have the same order as the keys of the index

42,11

42

7,11

5

TIDs

5,71

8,9

4

2,71

7

4,17

2

8

 Clustered Index: data blocks have the same order as the keys of the TIDs

index

42,11

42

7,11

7

5,71

8,9

5

8

2,71

2

4,17

4

sequence on hard disk October 11, 2007

Dr. Jens-Peter Dittrich / Institute of Information Systems / jens.dittrich@inf

33

Clustered Index TIDs

42,11

42

7,11

5,71

7

8,9

5

8

2,71

2

4,17

4

sequence on hard disk

 at most 1 clustered index possible for each table (usually for the primary key)

 Note:

Clustering can also be done by forcing direct storage (assuming that the DBMS will keep leaves in sequential order)

October 11, 2007

Dr. Jens-Peter Dittrich / Institute of Information Systems / jens.dittrich@inf

34

Non-Clustered Index TIDs

42,11

42

7,11

5

5,71

4

8,9

7

2,71

2

4,17

8

sequence on hard disk

 multiple non-clustered indexes per table possible  implies an indirect index  suitable for 

selective queries



long records

October 11, 2007

Dr. Jens-Peter Dittrich / Institute of Information Systems / jens.dittrich@inf

35

Bulk-Loading  Problem: How to create an index for an existing table?  Simple algorithm (Bottom-up index creation): 

sort records into sequence S‘ (sort key: key to use for the new index)



while S‘ not empty - take the first F.2k* records of S‘ and put them into a new leaf ( 0.5< F <=1, desired filling rate of the leaf) - put a pointer to this leaf in the parent node (recursively create if not exists)

 Discussion 

creates B+-tree from the left to the right and from bottom to top



no node/leaf split will ever occur during the bulk-loading



easy to implement (assuming external sorting operator is available)

October 11, 2007

Dr. Jens-Peter Dittrich / Institute of Information Systems / jens.dittrich@inf

36

Bulk-Loading Example 1.

2. 1

42 2

3

1

3

1

11

3. 42 2

42 2

11

4

11 5

12

5

11

4

4.

11 5

12

8

32 9

12

9 3

1

k=k*=1

42 2

October 11, 2007

11

5

4

14

11 5

12

8

32 9

12

Dr. Jens-Peter Dittrich / Institute of Information Systems / jens.dittrich@inf

12 33 14 77 37

Bulk-Loading: Discussion  Cost 1. Cost for (external) sorting 2. Linear cost in the number of leaves for tree creation 

Advantage: Algorithm creates contiguous output of leaves as a sideeffect (fully defragmented tree)



Many other algorithms exist (e.g. Bulk-loading of an index already having entries)



Literature 

Lars Arge: The Buffer Tree: A New Technique for Optimal I/OAlgorithms. WADS 1995: 334-345 .



Jochen Van den Bercken, Bernhard Seeger: An Evaluation of Generic Bulk Loading Techniques. VLDB 2001: 461-470.

October 11, 2007

Dr. Jens-Peter Dittrich / Institute of Information Systems / jens.dittrich@inf

38

Performance Optimizations of B+-Trees 1. Problem: height of the tree has a huge impact on performance 

Goal: maximize fan-out (number of children)



Approaches: - Prefix B+-trees - Prefix/suffix-compression - Large pages

2. Problem: B+-trees not optimized for main memory and CPU-caches 

Goal: optimize cache usage of trees



Approaches: - Cache conscious B+-trees

October 11, 2007

Dr. Jens-Peter Dittrich / Institute of Information Systems / jens.dittrich@inf

39

Prefix-B+-Trees  What happens if we want to use very long keys in the B+-tree? ...

Architektur und Implementierung von Datenbanksystemen

Buzzness Desinformation Warehousing

 Considerable space required for keys 

low fan-out

Design Patterns

...

less entries per node/leaf

high tree!

 Note: any string key in-between “Archi...” < key <= “Design..” may serve as a pivot in the node.

 Solution: modify split-operation such that a shorter pivot is generated in the node.

...

...

October 11, 2007

Architektur und Implementierung von Datenbanksystemen

Buzz

...

Buzzness Desinformation Warehousing

Dr. Jens-Peter Dittrich / Institute of Information Systems / jens.dittrich@inf

Design Patterns

...

40

Prefix-Compression  Previous slide: any string key in-between “Archi...” < key <= “Design..” may serve as a pivot in the node.

 Improvement: choose key such that key is minimal, i.e., key is a prefix for keys Ki und Ki+1, if: 

Ki < key <= Ki+1



no other key‘ exists such that Ki < key‘ <= Ki+1 and len(key‘) < len(key)

 Impact: 

subtrees have a common prefix



for any node N it holds: the children of N do not have to store the prefix anymore



But: for ISAM we require the entire key on the leaf level!



Prefix B+-trees are in fact a generalization of digital trees

October 11, 2007

Dr. Jens-Peter Dittrich / Institute of Information Systems / jens.dittrich@inf

41

Suffix-Compression (aka front coding)  Terminology: 

Präfix-compression: the suffix will be omitted



Suffix-compression: the prefix will be omitted

 Idea:

For each key Ki only store the difference to its predecessor key Ki-1

 Example: 

Hugo, Hummel, Hummer, Hund, Hupen



(0, Hugo), (2, mmel), (5, r), (2, nd), (2, pen)

suffix F = length of the common prefix October 11, 2007

requires appropriate physical representation, e.g., byte (but not integer)

Dr. Jens-Peter Dittrich / Institute of Information Systems / jens.dittrich@inf

42

Suffix-Compression: Discussion  Actual low-level storage layout should be based on statistical information on the data (if not we may not gain anything but rather inflate the data)

 Disadvantage: binary search on data does not work anymore  Improvement: Partial front coding

October 11, 2007

Dr. Jens-Peter Dittrich / Institute of Information Systems / jens.dittrich@inf

43

Partial Suffix-Compression (aka partial front coding)  Idea:

same as front coding, but: store every k-th entry uncompressed

 Example: (k=4) 

Hugo, Hummel, Hummer, Hund, Hupen, Husky, Husten, ...



(0, Hugo), (2, mmel), (5, r), (2, nd), (0, Hupen) , (2, sky), (3, ten), ...

binary search can use these entries, other entries may be reached by scanning from these “hub“-entries

 Literature: Witten, Moffat, Bell: Managing Gigabytes October 11, 2007

Dr. Jens-Peter Dittrich / Institute of Information Systems / jens.dittrich@inf

44

Performance Optimizations of B+-Trees 1. Problem: height of the tree has a huge impact on performance 

Goal: maximize fan-out (number of children)



Approaches: - Prefix B+-trees - Prefix/suffix-compression - Large pages

2. Problem: B+-trees not optimized for main memory and CPU-caches 

Goal: optimize cache usage of trees



Approaches: - Cache conscious B+-trees

October 11, 2007

Dr. Jens-Peter Dittrich / Institute of Information Systems / jens.dittrich@inf

45

Large Pages Scenario: double page size => double node/leaf size

 Advantages: 

Fan-out increases

probability that height of the tree increases



I/O-cost do not increase much

 Disadvantages: 

more clipping due to large pages



more dead data in the DB-buffer



time to perform binary search inside a node/leaf increases (slightly)

 How to fix: 

“logically large pages/blocks”: but have to be placed adjacent on disk

October 11, 2007

Dr. Jens-Peter Dittrich / Institute of Information Systems / jens.dittrich@inf

46

Large Index Pages  Other idea:

use pages/blocks of different size

 Disadvantages: 

DB-buffer typically only supprts a fixed page size

October 11, 2007

Dr. Jens-Peter Dittrich / Institute of Information Systems / jens.dittrich@inf

47

Performance Optimizations of B+-Trees 1. Problem: height of the tree has a huge impact on performance 

Goal: maximize fan-out (number of children)



Approaches: - Prefix B+-trees - Prefix/suffix-compression - Large pages

2. Problem: B+-trees not optimized for main memory and CPU-caches 

Goal: optimize cache usage of trees



Approaches: - Cache conscious B+-trees

October 11, 2007

Dr. Jens-Peter Dittrich / Institute of Information Systems / jens.dittrich@inf

48

Cache-Conscious B+-Trees  Problem: B+-trees were designed for external memory (hard disks)  How to optimize B+-trees such that they perform well w.r.t. the different caches?

 two major approaches: 

cache-oblivious B+-trees - no knowledge (obliviousness) on the precise characteristics of the memory hierarchy - generic approach - not taylored towards a specific architecture

(the topic ‘cache oblivious-Algos.’ could actually fill a separate lecture.)  cache-conscious B+-trees - precise knowledge (consciousness) on the precise characteristics of the memory hierarchy - requires more fine-tuning - taylored towards a specific architecture October 11, 2007

Dr. Jens-Peter Dittrich / Institute of Information Systems / jens.dittrich@inf

49

Cache-Conscious B+-Trees  Agenda 

Graefe



CSB+-trees



Prefetching



Fractal Prefetching

October 11, 2007

Dr. Jens-Peter Dittrich / Institute of Information Systems / jens.dittrich@inf

50

Overview on Techniques  Literature: Goetz Graefe, Per-Åke Larson: B-Tree Indexes and CPU Caches. ICDE 2001:349-358.

October 11, 2007

Dr. Jens-Peter Dittrich / Institute of Information Systems / jens.dittrich@inf

51

CSB+-Trees  =Cache conscious B+-Trees  Idea: 

level-wise layout of nodes



omit all node pointers except the first one



Impact: doubles fan-out of nodes

B+-tree

1

CSB+-tree

2

1

2

1

2

1 2 3 4

1

2

1 2 3 4

1 2 3 4

1 2 3 4

 search becomes up to 35% faster. But: updates become 30% slower!  Literature: Jun Rao, Kenneth A. Ross: Making B+-Trees Cache Conscious in Main Memory. SIGMOD Conference 2000: 475-486

October 11, 2007

Dr. Jens-Peter Dittrich / Institute of Information Systems / jens.dittrich@inf

52

Prefetching: pB+-Trees 

node size is a multiple of a cache line



prrefetch all cache lines before starting the binary search inside a node



Layout: first store all keys than all pointers

node 1

node 2

node 3

cache miss

cache miss

 Idea:

node 1 node 2

node 3

time

time 3 levels: nodes of size 2 cache lines

3 levels: nodes of size 2 cache lines using prefetching

 >2 performance improvement for both search and update operations  orthogonal to CSB+-trees  Literature: Shimin Chen, Phillip B. Gibbons, Todd C. Mowry: Improving Index Performance through Prefetching. SIGMOD Conference 2001

October 11, 2007

Dr. Jens-Peter Dittrich / Institute of Information Systems / jens.dittrich@inf

53

Fractal Prefetching: fpB+-Trees  Idea: 

tree of trees



node external view: disk-optimized B+-tree



node internal view: cache-optimized pB+-tree



analogue to PAX (do not change the outside world)

pages

...  Literature: Shimin Chen, Phillip B. Gibbons, Todd C. Mowry, Gary Valentin: Fractal prefetching B+-Trees: optimizing both cache and disk performance. SIGMOD Conference 2002: 157-168

October 11, 2007

Dr. Jens-Peter Dittrich / Institute of Information Systems / jens.dittrich@inf

54

Next Week: Indexing 2. Multi-dimensional index structures

Chap 3

Overview

More details

Related Documents

Chap 3

Chap 3

Chap 3

Chap 3

Chap 3

Chap 3