Architecture and Implementation of Database Systems HS 07 Dr. Jens Dittrich jens.dittrich@inf www.inf.ethz.ch/~jensdi
Institut of Information Systems
Indexing 1. One-dimensional Index Structures (Part 2)
Motivation Example queries:
What is the address of the student having ID 424342?
Which students attend less than 2 lectures in this semester?
Which students live in canton Zurich?
Which students do not live in canton Zurich?
How does the DBMS compute the results to these queries?
inspect all records (sequential access pattern)
organize data cleverly such that record may be found fast (index-based access patterns)
October 11, 2007
Dr. Jens-Peter Dittrich / Institute of Information Systems / jens.dittrich@inf
3
Primary versus Secondary Access Paths Primary access path:
Access using an index that is based on the primary key of a relation
Secondary access path:
Access using an index based on an attribute that is not the primary key
Examples:
Key
fname
lname
77
Frank
Meier
Index of... Key TID = primary access path fname TID = secondary access path lname TID = secondary access path
12
Simon
Schmidt
42
Hugo
Müller
11
Hans
Meier
25
Jens
Dittrich
76
Hugo
Schmidt
October 11, 2007
Dr. Jens-Peter Dittrich / Institute of Information Systems / jens.dittrich@inf
4
Secondary Access Paths and Inversion Query using a secondary access path may return more than one result (1:n relationship among keys and results)
fname Frank Hans Hugo Jens Simon
October 11, 2007
TID
Key
fname
lname
1,0
77
Frank
Meier
1,1
12
Simon
Schmidt
1,2
42
Hugo
Müller
1,3
11
Hans
Meier
1,4
25
Jens
Dittrich
1,5
76
Hugo
Schmidt
Dr. Jens-Peter Dittrich / Institute of Information Systems / jens.dittrich@inf
5
Indexing in
(without Ranking)
Assign a unique document ID (DocID) to every document (html, pdf, doc, etc.)
Create a secondary access path indexing the mapping keyword --> {(DocID, {pos})}
{pos} = list of positions of occurrences inside a single document.
This data structure is named inverted list or inverted file. Example: ... jens
-> (42,{3,500,900,1000}), (88,{3,300}), (4025,{1,20,5000}), ditttrich -> (12,{2,450,600}), (78,{1,4300,7000}), (2123,{30}), ....
October 11, 2007
Dr. Jens-Peter Dittrich / Institute of Information Systems / jens.dittrich@inf
6
Keyword Search in
How? Lookup the list of entries for keyword “jens“ Return the first ten entries Note: the first ten elements do not have to be the most important ones!
What happens when we query for multiple keywords?
October 11, 2007
Dr. Jens-Peter Dittrich / Institute of Information Systems / jens.dittrich@inf
7
Google
Multiple Keywords (Without Ranking)
Example: Algorithm
3 accesses on secondary access path using keys <eth>, <jens>,
Result: 3 sequences T1‘, T2‘, T3‘ of DocIDs (document IDs)
Compute intersection T‘ = T1‘∩{ t | t ∈ (T2‘ ∩ T3‘) ∧ pos(t,T2‘) = pos (t,T3‘)-1 }
Return first ten elements as result
Note: the first ten elements do not have to be the most important ones!
End.
October 11, 2007
Dr. Jens-Peter Dittrich / Institute of Information Systems / jens.dittrich@inf
8
Indexing in
(with Ranking)
Problem: keyword search may return millions of results For keyword “jens” Google estimates 43.8 million documents. Furthermore: The most important documents do not have to be among the top 10 pages.
Solution: try to order documents based on their (queryindependent)rank
many algorithms, the most important one: Page Rank more about page rank in the lecture “Information Retrieval“
However: what is the impact of ranking on indexing?
October 11, 2007
Dr. Jens-Peter Dittrich / Institute of Information Systems / jens.dittrich@inf
9
Indexing in
(with Ranking)
Core idea:
1. enumerate documents using docID = rank i.e., the document having docID=1 is the most important one Note: docIDs still have to be a key! (no duplicates) 2. Sort each result list by docID (i.e., its rank)
Impact: sort-based intersection still works Example: ...
Rank
jens
-> (7000,{3,500,900,1000}), (8888,{3,300}), (40251,{1,20,5000}), dittrich -> (12,{2,450,600}), (78,{1,4300,7000}), (2123,{30}), ....
October 11, 2007
Dr. Jens-Peter Dittrich / Institute of Information Systems / jens.dittrich@inf
10
Google
Multiple Keywords (With Ranking)
Example: Algorithm (corresponds to algorithm without ranking)
3 accesses on secondary path using keys <eth>, <jens>,
Result: 3 sequences T1‘, T2‘, T3‘ of DocIDs (document IDs)
Compute intersection T‘ = T1‘∩{ t | t ∈ (T2‘ ∩ T3‘) ∧ pos(t,T2‘) = pos (t,T3‘)-1 }
Return first 10 docIDs of T’ as result.
Difference: the first 10 documents are the most important documents (w.r.t. page rank)
October 11, 2007
Dr. Jens-Peter Dittrich / Institute of Information Systems / jens.dittrich@inf
11
Requirements for Index Structures
efficient usage of memory and disk low I/O-cost low CPU-cost low response times high throughout of operations easy to integrate into existing DBMSs easy to extend easy to maintain
October 11, 2007
Dr. Jens-Peter Dittrich / Institute of Information Systems / jens.dittrich@inf
12
Tree-Structured Indexes: Different Classes binary trees
not suitable for a DBMS
reason: hard to map nodes to pages
digital trees
important for non-relational data
B-trees
most important index structure for databases
Advantage: extremely versatile, easy to extend
has been researched for more than 30 years, still important extensions being proposed
many index structures with similar ideas exist: R-tree, M-tree
October 11, 2007
Dr. Jens-Peter Dittrich / Institute of Information Systems / jens.dittrich@inf
13
B-Trees: Agenda
Basics (repetition) ISAM Clustered index Indirect vs. direct storage Primary versus secondary b-tree Bulk-loading Prefix B+-tree Prefix/suffix-compression Large index pages Cache conscious B+-trees
October 11, 2007
Dr. Jens-Peter Dittrich / Institute of Information Systems / jens.dittrich@inf
14
B-Tree Pointer P0 points to subtree with keys strictly smaller than K1 For Pi (i=1,...,n-1) it holds: Ki < keys(Pi) < Ki+1 Pn points to a subtree having keys strictly greater than Kn node structure: P0
D1
K1
P1
K2
D2
P2
...
Kn
Dn
Pi: pointer
Pn
Ki: key Di: data root node
example: 7 3
1
4
43
43
11 41
4
2
5
7
9
6
null pointers October 11, 2007
Dr. Jens-Peter Dittrich / Institute of Information Systems / jens.dittrich@inf
12 3
14 5
B-Tree: Definition a B-tree of type (k, h) has the following invariants: 1. Every path from the root node to any leave has length h. 2. Every node (except root node and leaf level nodes) has at least k+1 children. The root is either a leaf or is a node and has at least two children. 3. Every node has at most 2k+1 children. node structure: P0
K1
D1
Pi: pointer P1
K2
D2
P2
...
Kn
Dn
Pn
k <= n <= 2k (if neither root onr leaf level)
Ki: key Di: data
Storage: B-tree nodes are mapped to pages/blocks. Literatur: R. Bayer, E. M. McCreight. Organization and Maintenance of Large Ordered Indexes. Acta Informatica, 1:4. 1972. 290-306. October 11, 2007
Dr. Jens-Peter Dittrich / Institute of Information Systems / jens.dittrich@inf
B+-Trees Like B-trees, but:
data entries are only stored in leaves
Impact: higher fan-out of nodes => height of tree decreases
leaves are connected to build a double-linked list: ISAM (Index Sequential Access Method)
Pi: pointer Ki: key Di: data
7 3
1
42
11
4
11 5
12
9
32
null pointer
October 11, 2007
12 33 14 77 null pointer
Dr. Jens-Peter Dittrich / Institute of Information Systems / jens.dittrich@inf
17
B+-Trees: Definition a B+-tree of type (k, k*, h) has the following invariants: 1. Every path from the root node to any leaf has length h. 2. Every node has at least k+1 children and at most 2k+1 children. 3. Every leaf has at least k* and at most 2k* entries. 4.The root is either a leaf or is a node having at least two children. Pi: pointer
node structure: P0
K1
P1
K2
P2
...
Kn
Pn
k <= n <= 2k
K0
D0
K1
D1
K2
...
Kl
Dl
N
k* <= l <= 2k* October 11, 2007
Di: data V: left sibling pointer
leaf structure: V
Ki: key
Dr. Jens-Peter Dittrich / Institute of Information Systems / jens.dittrich@inf
N: right sibling pointer
B+-Trees
P0 points to a subtree having keys <= K1 (smaller or equal) For Pi (i=1,...,n-1) it holds: Ki < keys(Pi) <= Ki+1 Pn points to the subtree having keys strictly greater than Kn 2 different node types: nodes and leaves
node structure: P0
K1
P1
K2
P2
...
Kn
Pi: pointer
Pn
Ki: key Di: data
7 3
1
42
11
4
11 7
12
11
32
null pointer October 11, 2007
12 33 14 77 null pointer
Dr. Jens-Peter Dittrich / Institute of Information Systems / jens.dittrich@inf
B+-Trees Point Query (find_key)
Recursive search starting with the root node Inside a node or leaf: binary search Exactly h-1 nodes and one leaf will be touched Example: find_key [4]
7 3
1
42
October 11, 2007
11
4
11 7
12
11
32
Dr. Jens-Peter Dittrich / Institute of Information Systems / jens.dittrich@inf
12 33 14 77
20
B+-Tree Range Query (find_range) Example: find_range [4;12] Algorithm: 1. point query with key 4 (i.e., find_key [4]) 2. Read leaves starting at position of key 4 sequentially until keys are strictly greater than 12 (i.e., ISAM: Index Sequential Access Method)
7 3
1
42
October 11, 2007
11
4
11 7
12
11
32
Dr. Jens-Peter Dittrich / Institute of Information Systems / jens.dittrich@inf
12 33 14 77
21
B+-Trees: split If leaf > 2k* entries:
Create a new leaf
Distribute entries to both leaves such that each leaf has >= k* entries
insert pointer and pivot into parent node Example Before split: 11 25
leave L1
4
11 7
12
leaf L1
12 33 14 77
After insert [13,42] and split:
node N1
pivot node N1
11
12 33 13 42
13
25
14 77
new leaf
(split is similar for nodes) October 11, 2007
Dr. Jens-Peter Dittrich / Institute of Information Systems / jens.dittrich@inf
B+-Tree: insert (Procedural Programming) insert (42,12): 1. node = root 2. while (node!=Leaf) node = choose_subtree( node, 42 ) 3. node.insert_entry(42,12) 4. If node has more than 2k* entries: Split leave into two leaves Distribute entries to both leaves insert new leaf pointer and pivot into parent node If parent node has more than 2k+1 children: Split parent node etc. (until root is reached or no split necessary anymore) ... October 11, 2007
Dr. Jens-Peter Dittrich / Institute of Information Systems / jens.dittrich@inf
23
B+-Tree: insert (Object-oriented 1/2) Node.insert (42,12): 1. (split, left, pivot, right) = choose_subtree( 42 ).insert( 42,12 ) 2. If split occurred: AbstractNode
this.insert_node( pivot, right ) If this.entries > 2k+1: return this.split() 3. return (false, this, NULL, NULL)
Node
Leaf
Leaf.insert (42,12): 1. insert_tuple( 42,12 ) 2. If this.entries > 2k*:
Call: root.insert(42, 12)
return this.split() Else return (false, this, NULL, NULL) October 11, 2007
Dr. Jens-Peter Dittrich / Institute of Information Systems / jens.dittrich@inf
24
B+-Tree: insert (Object-oriented 2/2) Node.insert (42,12): 1. (split, left, pivot, right) = choose_subtree( 42 ).insert( 42,12 ) 2. If split: this.insert_node( pivot, right ) If this.entries > 2k+1:
usage: root.insert(42, 12)
return this.split() 3. return (false, this, NULL, NULL) Question: what happens if the root node has to be split?
October 11, 2007
Dr. Jens-Peter Dittrich / Institute of Information Systems / jens.dittrich@inf
25
B+-Tree: other insert-Strategies If node full Try to redistribute entries to d predecessor and/or d successor nodes If redistribute successful: insert without split Else split
Discussion:
improves memory usage of the tree
Disadvantage: redistribution may be costly (adjust parent nodes?)
Requires double-linked list on nodes (similar to ISAM on leaves)
October 11, 2007
Dr. Jens-Peter Dittrich / Institute of Information Systems / jens.dittrich@inf
B+-Tree: other split-Strategies If split occurs:
Create new leaf (node)
Redistribute entries of m neighboring leaves (nodes) such that every leaf (node) has at least k* entries (k+1 children) m=1: normal split
Pi
Pi October 11, 2007
Pi+1
Pi+1
Pi+2
50%
50%
67%
67%
75%
75%
m=2
67%
75%
Dr. Jens-Peter Dittrich / Institute of Information Systems / jens.dittrich@inf
75%
m=3
B+-Bäume: delete discussion analogue to split-operation! If node/leaf underflows: merge merge = inverse operation of split
i.e., put entries of m leaves (nodes) into a single leaf (node), all other leaves (nodes) are deleted
Other strategy:
relax B+-tree invariant and allow nodes/leaves to underflow without performing a merge
merge will only be performed if underflow continues for a certain amount of time
or: merge will never be performed (tolerate some dead space)
improves ISAM-access (less defragmentation of leaves)
October 11, 2007
Dr. Jens-Peter Dittrich / Institute of Information Systems / jens.dittrich@inf
28
ISAM and Block Fragmentation Double-linked list on leaves allows efficient sequential access (ISAM) 7 3
1
42
11
4
11 7
12
11
32
12 33 14 77
Problem:
inserts and deletes will fragment leaves on disk (leaves not contiguous on disk anymore)
many inserts/updates => high fragmentation => increase of random I/O => decrease of ISAM performance
How to fix:
defragment tree regularly (analogue to operating system)
try to preallocate free blocks in block order on disk in regions where the tree might grow in future
October 11, 2007
Dr. Jens-Peter Dittrich / Institute of Information Systems / jens.dittrich@inf
29
B+-Tree Python-demo
October 11, 2007
Dr. Jens-Peter Dittrich / Institute of Information Systems / jens.dittrich@inf
30
Indirect vs. Direct Storage What is stored inside the leaves? Either: pointers to records (TIDs), indirect storage TIDs
42,11
42
7,11
7
5,71
5
43,9
43
Or: the actual records, direct storage
October 11, 2007
Dr. Jens-Peter Dittrich / Institute of Information Systems / jens.dittrich@inf
2,71
2
8,17
8
data pages
records
31
B+-trees and Inversion Queries using a secondary access path may return more than one result (1:n relationship among keys and results) fname Frank Hans Hugo Jens Simon
TID
Key
fname
lname
1,0
77
Frank
Meier
1,1
12
Simon
Schmidt
1,2
42
Hugo
Müller
1,3
11
Hans
Meier
1,4
25
Jens
Dittrich
1,5
76
Hugo
Schmidt
How do we store these lists of results in a B+-tree? Example: indirect storage in the leaf: list of TIDs Frank
October 11, 2007
1,0
Hans
1,3
Hugo
1,2
1,5
Dr. Jens-Peter Dittrich / Institute of Information Systems / jens.dittrich@inf
...
32
Clustered vs. Non-Clustered Index Non-clustered index: data blocks do not necessarily have the same order as the keys of the index
42,11
42
7,11
5
TIDs
5,71
8,9
4
2,71
7
4,17
2
8
Clustered Index: data blocks have the same order as the keys of the TIDs
index
42,11
42
7,11
7
5,71
8,9
5
8
2,71
2
4,17
4
sequence on hard disk October 11, 2007
Dr. Jens-Peter Dittrich / Institute of Information Systems / jens.dittrich@inf
33
Clustered Index TIDs
42,11
42
7,11
5,71
7
8,9
5
8
2,71
2
4,17
4
sequence on hard disk
at most 1 clustered index possible for each table (usually for the primary key)
Note:
Clustering can also be done by forcing direct storage (assuming that the DBMS will keep leaves in sequential order)
October 11, 2007
Dr. Jens-Peter Dittrich / Institute of Information Systems / jens.dittrich@inf
34
Non-Clustered Index TIDs
42,11
42
7,11
5
5,71
4
8,9
7
2,71
2
4,17
8
sequence on hard disk
multiple non-clustered indexes per table possible implies an indirect index suitable for
selective queries
long records
October 11, 2007
Dr. Jens-Peter Dittrich / Institute of Information Systems / jens.dittrich@inf
35
Bulk-Loading Problem: How to create an index for an existing table? Simple algorithm (Bottom-up index creation):
sort records into sequence S‘ (sort key: key to use for the new index)
while S‘ not empty - take the first F.2k* records of S‘ and put them into a new leaf ( 0.5< F <=1, desired filling rate of the leaf) - put a pointer to this leaf in the parent node (recursively create if not exists)
Discussion
creates B+-tree from the left to the right and from bottom to top
no node/leaf split will ever occur during the bulk-loading
easy to implement (assuming external sorting operator is available)
October 11, 2007
Dr. Jens-Peter Dittrich / Institute of Information Systems / jens.dittrich@inf
36
Bulk-Loading Example 1.
2. 1
42 2
3
1
3
1
11
3. 42 2
42 2
11
4
11 5
12
5
11
4
4.
11 5
12
8
32 9
12
9 3
1
k=k*=1
42 2
October 11, 2007
11
5
4
14
11 5
12
8
32 9
12
Dr. Jens-Peter Dittrich / Institute of Information Systems / jens.dittrich@inf
12 33 14 77 37
Bulk-Loading: Discussion Cost 1. Cost for (external) sorting 2. Linear cost in the number of leaves for tree creation
Advantage: Algorithm creates contiguous output of leaves as a sideeffect (fully defragmented tree)
Many other algorithms exist (e.g. Bulk-loading of an index already having entries)
Literature
Lars Arge: The Buffer Tree: A New Technique for Optimal I/OAlgorithms. WADS 1995: 334-345 .
Jochen Van den Bercken, Bernhard Seeger: An Evaluation of Generic Bulk Loading Techniques. VLDB 2001: 461-470.
October 11, 2007
Dr. Jens-Peter Dittrich / Institute of Information Systems / jens.dittrich@inf
38
Performance Optimizations of B+-Trees 1. Problem: height of the tree has a huge impact on performance
Goal: maximize fan-out (number of children)
Approaches: - Prefix B+-trees - Prefix/suffix-compression - Large pages
2. Problem: B+-trees not optimized for main memory and CPU-caches
Goal: optimize cache usage of trees
Approaches: - Cache conscious B+-trees
October 11, 2007
Dr. Jens-Peter Dittrich / Institute of Information Systems / jens.dittrich@inf
39
Prefix-B+-Trees What happens if we want to use very long keys in the B+-tree? ...
Architektur und Implementierung von Datenbanksystemen
Buzzness Desinformation Warehousing
Considerable space required for keys
low fan-out
Design Patterns
...
less entries per node/leaf
high tree!
Note: any string key in-between “Archi...” < key <= “Design..” may serve as a pivot in the node.
Solution: modify split-operation such that a shorter pivot is generated in the node.
...
...
October 11, 2007
Architektur und Implementierung von Datenbanksystemen
Buzz
...
Buzzness Desinformation Warehousing
Dr. Jens-Peter Dittrich / Institute of Information Systems / jens.dittrich@inf
Design Patterns
...
40
Prefix-Compression Previous slide: any string key in-between “Archi...” < key <= “Design..” may serve as a pivot in the node.
Improvement: choose key such that key is minimal, i.e., key is a prefix for keys Ki und Ki+1, if:
Ki < key <= Ki+1
no other key‘ exists such that Ki < key‘ <= Ki+1 and len(key‘) < len(key)
Impact:
subtrees have a common prefix
for any node N it holds: the children of N do not have to store the prefix anymore
But: for ISAM we require the entire key on the leaf level!
Prefix B+-trees are in fact a generalization of digital trees
October 11, 2007
Dr. Jens-Peter Dittrich / Institute of Information Systems / jens.dittrich@inf
41
Suffix-Compression (aka front coding) Terminology:
Präfix-compression: the suffix will be omitted
Suffix-compression: the prefix will be omitted
Idea:
For each key Ki only store the difference to its predecessor key Ki-1
Example:
Hugo, Hummel, Hummer, Hund, Hupen
(0, Hugo), (2, mmel), (5, r), (2, nd), (2, pen)
suffix F = length of the common prefix October 11, 2007
requires appropriate physical representation, e.g., byte (but not integer)
Dr. Jens-Peter Dittrich / Institute of Information Systems / jens.dittrich@inf
42
Suffix-Compression: Discussion Actual low-level storage layout should be based on statistical information on the data (if not we may not gain anything but rather inflate the data)
Disadvantage: binary search on data does not work anymore Improvement: Partial front coding
October 11, 2007
Dr. Jens-Peter Dittrich / Institute of Information Systems / jens.dittrich@inf
43
Partial Suffix-Compression (aka partial front coding) Idea:
same as front coding, but: store every k-th entry uncompressed
Example: (k=4)
Hugo, Hummel, Hummer, Hund, Hupen, Husky, Husten, ...
(0, Hugo), (2, mmel), (5, r), (2, nd), (0, Hupen) , (2, sky), (3, ten), ...
binary search can use these entries, other entries may be reached by scanning from these “hub“-entries
Literature: Witten, Moffat, Bell: Managing Gigabytes October 11, 2007
Dr. Jens-Peter Dittrich / Institute of Information Systems / jens.dittrich@inf
44
Performance Optimizations of B+-Trees 1. Problem: height of the tree has a huge impact on performance
Goal: maximize fan-out (number of children)
Approaches: - Prefix B+-trees - Prefix/suffix-compression - Large pages
2. Problem: B+-trees not optimized for main memory and CPU-caches
Goal: optimize cache usage of trees
Approaches: - Cache conscious B+-trees
October 11, 2007
Dr. Jens-Peter Dittrich / Institute of Information Systems / jens.dittrich@inf
45
Large Pages Scenario: double page size => double node/leaf size
Advantages:
Fan-out increases
probability that height of the tree increases
I/O-cost do not increase much
Disadvantages:
more clipping due to large pages
more dead data in the DB-buffer
time to perform binary search inside a node/leaf increases (slightly)
How to fix:
“logically large pages/blocks”: but have to be placed adjacent on disk
October 11, 2007
Dr. Jens-Peter Dittrich / Institute of Information Systems / jens.dittrich@inf
46
Large Index Pages Other idea:
use pages/blocks of different size
Disadvantages:
DB-buffer typically only supprts a fixed page size
October 11, 2007
Dr. Jens-Peter Dittrich / Institute of Information Systems / jens.dittrich@inf
47
Performance Optimizations of B+-Trees 1. Problem: height of the tree has a huge impact on performance
Goal: maximize fan-out (number of children)
Approaches: - Prefix B+-trees - Prefix/suffix-compression - Large pages
2. Problem: B+-trees not optimized for main memory and CPU-caches
Goal: optimize cache usage of trees
Approaches: - Cache conscious B+-trees
October 11, 2007
Dr. Jens-Peter Dittrich / Institute of Information Systems / jens.dittrich@inf
48
Cache-Conscious B+-Trees Problem: B+-trees were designed for external memory (hard disks) How to optimize B+-trees such that they perform well w.r.t. the different caches?
two major approaches:
cache-oblivious B+-trees - no knowledge (obliviousness) on the precise characteristics of the memory hierarchy - generic approach - not taylored towards a specific architecture
(the topic ‘cache oblivious-Algos.’ could actually fill a separate lecture.) cache-conscious B+-trees - precise knowledge (consciousness) on the precise characteristics of the memory hierarchy - requires more fine-tuning - taylored towards a specific architecture October 11, 2007
Dr. Jens-Peter Dittrich / Institute of Information Systems / jens.dittrich@inf
49
Cache-Conscious B+-Trees Agenda
Graefe
CSB+-trees
Prefetching
Fractal Prefetching
October 11, 2007
Dr. Jens-Peter Dittrich / Institute of Information Systems / jens.dittrich@inf
50
Overview on Techniques Literature: Goetz Graefe, Per-Åke Larson: B-Tree Indexes and CPU Caches. ICDE 2001:349-358.
October 11, 2007
Dr. Jens-Peter Dittrich / Institute of Information Systems / jens.dittrich@inf
51
CSB+-Trees =Cache conscious B+-Trees Idea:
level-wise layout of nodes
omit all node pointers except the first one
Impact: doubles fan-out of nodes
B+-tree
1
CSB+-tree
2
1
2
1
2
1 2 3 4
1
2
1 2 3 4
1 2 3 4
1 2 3 4
search becomes up to 35% faster. But: updates become 30% slower! Literature: Jun Rao, Kenneth A. Ross: Making B+-Trees Cache Conscious in Main Memory. SIGMOD Conference 2000: 475-486
October 11, 2007
Dr. Jens-Peter Dittrich / Institute of Information Systems / jens.dittrich@inf
52
Prefetching: pB+-Trees
node size is a multiple of a cache line
prrefetch all cache lines before starting the binary search inside a node
Layout: first store all keys than all pointers
node 1
node 2
node 3
cache miss
cache miss
Idea:
node 1 node 2
node 3
time
time 3 levels: nodes of size 2 cache lines
3 levels: nodes of size 2 cache lines using prefetching
>2 performance improvement for both search and update operations orthogonal to CSB+-trees Literature: Shimin Chen, Phillip B. Gibbons, Todd C. Mowry: Improving Index Performance through Prefetching. SIGMOD Conference 2001
October 11, 2007
Dr. Jens-Peter Dittrich / Institute of Information Systems / jens.dittrich@inf
53
Fractal Prefetching: fpB+-Trees Idea:
tree of trees
node external view: disk-optimized B+-tree
node internal view: cache-optimized pB+-tree
analogue to PAX (do not change the outside world)
pages
... Literature: Shimin Chen, Phillip B. Gibbons, Todd C. Mowry, Gary Valentin: Fractal prefetching B+-Trees: optimizing both cache and disk performance. SIGMOD Conference 2002: 157-168
October 11, 2007
Dr. Jens-Peter Dittrich / Institute of Information Systems / jens.dittrich@inf
54
Next Week: Indexing 2. Multi-dimensional index structures