B Trees Advanced)

  • November 2019
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View B Trees Advanced) as PDF for free.

More details

  • Words: 3,531
  • Pages: 55
B-Trees & its Variants Advanced Data Structure Spring 2007 Zareen Alamgir

Motivation Yet another Tree! Why do we need another Tree-Structure ?

Data Retrieval from External Storage 

In database programs, the data is too large to fit in memory, therefore, it is stored on secondary storage (disks or tapes).



Disk access is very expensive, the disk I/O operation takes milliseconds while CPU processes data on the order of nanoseconds, one million times faster.



When dealing with external storage the disk accesses dominate the running time.

Balanced Binary Search Trees 

Balanced binary search trees (AVL & Red-Black) have good performance if the entire data can fit in the main memory.



These trees are not optimized for external storage and require many disk accesses, thus give poor performance for very large data.

Reduce Disk Accesses 

Data is transfer to and from the disk in block (typically block are of 512, 2048, 4096 or 8192 bytes).



To reduce disk accesses  Store multiple records in a block on the disk.  Reduce tree height by increasing the number of children of a node.



To achieve above goals we use Multiway (m-way) search tree, which is a generalization of BST, binary search tree. 25 62

12 19

3

5

15 17

32 39

21 23

30 31

34 37

73 84

45 51

69 71

75 79

90 94

Multiway(m-way) Search Trees

Multiway(m-way) Search Trees  

In an m-way tree all the nodes have degree ≤ m. Each node has the following structure: P1

keys
 



K1

P2

Ki-1

Pi

Ki

Ki-1


Ki are the keys, where 1 ≤ i ≤ q-1, q<=m



Pi are pointers to subtrees, where 1 ≤ i ≤ q, q<=m

Kq-1

Pq

Kq-1
The keys in each node are in ascending order K1 ≤ K2 ≤ ... ≤ Ki The key Ki is larger than keys in subtree pointed by Pi and smaller than keys in subtree pointed by Pi+1 . The subtrees are the m-way trees.

Multiway(m-way) Search Trees 

M-way tree is a generalization of BST, its working, benefits & issues are same. 



Benefits 



Problems 

Fast information retrieval. Fast update.





The tree is not balanced. Leaf nodes are on different levels. Bad space usage, tree can become skew.

25 62

12 23

3

5

32

15 21

30 31

17 19

M-way tree

73 84

69 71

90 94

B-Trees

B-Trees 

B-Tree is a balanced m-way tree that is tuned to minimize disk accesses.



The node size of B-Tree is usually equal to the disk block size and the number of keys in a node depends on   



Key size Disk block size Data organization (either key or entire data record is store in a node)

Access paths from root to leaf nodes are small. 25 62

12 19

3

5

15 17

32 39

21 23

30 31

34 37

73 84

45 51

69 71

75 79

90 94

B-Tree: Definition 

A B-tree of order m has following properties  

 



The root has at least two subtrees unless it is a leaf. Each non-root and each non-leaf node holds q-1 keys and q pointers to subtrees, where  m / 2 ≤ q ≤ m . Each leaf node holds q-1 keys where  m / 2 ≤ q ≤ m . All lea are on the same level.

It is clear that B-tree is always at least half full, has fewer levels and is perfectly balanced. P1 Fr

keys
K1

P2

Ki-1

Pi

Ki

Ki-1
Kq-1

Pq

Kq-1
B-Tree 

B-Tree can have a field KeyTally, KT, to indicate the number of keys currently stored in the node.



B-Tree node usually contains key and data pointer pair. The data pointer points to the data record which is not stored in the node, with this scheme we can pack more keys & pointers in a B-Tree node.

KT

P1

keys
K1

D1

Data pointer

Ki-1 DKi-1i-1

Pi

Ki

Ki-1
Di

Data pointer

Kq-1 KDq-1q-1

Pq

Kq-1
Height of B-Tree 

Height of the B-Tree with n keys is important as it bound the number of disk accesses.



The height of the tree is maximum when each node has minimum number of the subtree pointers, q =  m / 2 .

Height of B-Tree 

The height of B-tree is maximum if all nodes have minimum number of keys. 1 key in the root + 2(q-1) keys on the second level +……+ 2qh-2(q-1) keys in the leaves (level h).

1 + 2(q - 1) + 2q(q - 1) + …… + 2q h-2 (q - 1)  h −2 i  = 1 + ( q − 1) ∑ 2q   i =0  Applying the formula of geometric progression  q h −1 − 1   = 1 + 2( q − 1) q − 1   = −1 + 2q h −1 Thus, the number of keys in B - Tree of height h is given as : n ≥ −1 + 2q h −1 h ≤ log q

n +1 +1 2

Height of B-Tree 

The height of B-tree is minimum if all nodes are full, thus we have m-1 keys in the root + m(m-1) keys on the second level +……+ mh-1(m-1) keys in the leaf nodes

(m - 1) + m(m - 1) + m 2 (m - 1) + …… + m h-1 (m - 1) h −1

h −1

= ∑ (m − 1)m = ( m − 1)∑ m i i

i =0

i =0

Applying the formula of geometric progression  mh − 1   = (m − 1) m − 1   = mh − 1 Thus, the number of keys in B - Tree of height h is given as : n ≤ mh − 1 h ≥ logm (n + 1) logm ( n + 1) ≤ h ≤ log q

n +1 +1 2

Height of B-Tree 

If number of nodes in B-tree equal 2,000,000 (2 million) and m=200 then maximum height of B-tree is 3, where as the binary tree would be of height 20.



Note: Order m is chosen so that B-tree node size is nearly equal to the disk block size.

Search in a B-Tree 

Search in a B-tree is similar to the search in BST except that in B-tree we make a multiway branching decision instead of binary branching in BST. 25 62

12 19

3

5

15 17

32 39

21 23

30 31

34 37

73 84

45 51

Search key 71

69 71

75 79

90 94

B-Tree Insert Operation 

Insertion in B-tree is more complicated than in BST.



In BST, the keys are added in top down fashion resulting in an unbalanced tree.



B-tree is built bottom up, the keys are added in the leaf node, if the leaf node is full another node is created, keys are evenly distributed and middle key is promoted to the parent. If parent is full, the process is repeated.



B-tree can also be built in top down fashion using pre-splitting technique.

Basic Idea Find position for the key in the appropriate leaf node

Insert key in order and adjust pointer

No

Is node full ? yes

Split node:

• Create a new node • Move half of the keys from the full node to the new node and adjust pointers • Promote the median key (before split) to the parent ≥  m / 2 − 1 Split guarantees that each node has keys.

If parent is full

Cases in B-Tree Insert Operation 

In B-tree insertion we have the following cases:  



Case 1: The leaf node has room for the new key. Case 2: The leaf in which key is to be placed is full.  This case can lead to the increase in tree height.

Now we explain these cases in detail.

B-Tree Insert Operation 

Case 1: The leaf node has room for the new key. Find appropriate leaf node for key 3

Insert 3 3 10 25

5

8

Insert 3 in order

14 19 20 23

32 38

B-Tree Insert Operation 

Case 2: The leaf in which key is to be placed is full. Find appropriate leaf node for key 16

Insert 16 16

3

5

8

10 19 25

14 19 20 23

32 38 No room for key 16 in leaf node

Insert key 19 in parent node in order

19 14 16

Move median key 19 up and Split node: create a new node and move keys to the new node.

20 23

B-Tree Insert Operation Case 2: The leaf in which key is to be placed is full and this lead to the increase in tree height.



45

13

9

12

14

19

20

2 3

27

3 2

33

38

38

47

51

48

59

75

32

38

55

52

67

57

4 7

51

59

75

32

38

81

61

47

72

51

59

75

32

3 8

77

47

86

51

59

75

32

38

92

47

51

59

75

B-Tree Insert Operation 

Case 2: The height of the tree increases.

Insert 16 55

Insert 27 in parent in order 16

45

5555

67

No room for 27 in parent, Split node 81

No room for 19 in parent, Split parent node

48

13

9

12

14

19

20

23

19 27

29

33

31

52

57

61

72

19

20

23

92

35

36

41

3

3

4

5

5

7

3

3

4

5

5

7

3

3

4

5

5

7

3

3

4

5

5

7

2

8

7

1

9

5

2

8

7

1

9

5

2

8

7

1

9

5

2

8

7

1

9

5

42

No room for key 16, Move median key 19 up & Split node

16

86

38

Insert 19 in parent node in order

14

77

B-Tree Delete Operation 

Deletion is analogous to insertion, but a little more complicated.



Two major cases  

Case 1: Deletion from leaf node Case 2: Deletion from nonleaf node 

Apply delete by copy technique used in BST, this will reduce this case to case 1.



In delete by copy, the key to be deleted is replaced by the largest key in the right subtree or smallest in left subtree (which is always a leaf).

B-Tree Delete Operation 

Leaf node deletion cases:  

After deletion node is at least half full. After deletion underflow occurs  Redistribute: if number of keys in siblings >  Merge nodes if number of keys in siblings <  Merging leads to decrease in tree height.

m  2  − 1 m  2  − 1

. .

B-Tree Delete Operation 

After deletion node is at least half full. (inverse of insertion case 1) Search key 3

10 25

3

5

8

14 19

32 38 40 45

Key found, delete key 3. Move others keys in the node to eliminate the gap.

B-Tree Delete Operation 

Underflow occurs, evenly redistribute the keys if left or right sibling has keys >  m / 2 − 1. Search key 14

Delete 14 10 25

5

8

14 19

32 38 40 45

Underflow occurs, evenly redistribute keys in the underflow node, in its sibling and the separator key.

B-Tree Delete Operation 

Underflow occurs and the keys in the left & right sibling are =  m / 2 − 1

. Merge the underflow node and a sibling. Move separator key down.

Delete 25 10 32

5

8

19 25

Move the keys to underflow node and discard the sibling.

38 40

Underflow occurs, merge nodes.

B-Tree Delete Operation 

Underflow occurs, height decreases after merging. Delete 21 70

Underflow occurs, merge nodes 8

3

5

32

21 27

79 85

47 66

Underflow occurs, merge nodes by moving separator key and the keys in sibling node to the underflow node.

73 75 78

81 83

88 90 92

Issues in B-tree 

In B-tree, accessing data in sequential order is not efficient. In-order traversal of the B-tree requires many disk accesses.



In B-tree, data pointers are stored in each node, thus resulting in less subtree pointers per node and more tree levels. KT

P1

keys
K1

D1

Data pointer

Ki-1

D Ki-1

Pi

Ki

Ki-1
Di

Kq-1 free KD space Pq q-1 q-1

Data pointer

Kq-1
B+-Trees

B+ -Tree- Variant of BTree 

Resolves the issues in B-tree.



In B+ -tree  Pointers to data is stored only in leaf nodes.  Internal nodes contain only keys and subtree pointers  



Can accommodate more keys in internal nodes. Less disk accesses due to fewer levels in the tree.

B+ -tree provides faster sequential access of data.

B+-Tree Structure 

B+-tree consist of two parts  Index Set  



Provides indexes for fast access of data. Consist of internal nodes that store only key & subtree pointers.

Sequence Set  

Consist of leaf nodes that contain data pointers. Provide efficient sequential access of data (using doubly linked list).

Index Set

Sequential Search

Sequence Set

B+-Tree: Index Node Structure 

The basic structure of the B+-tree index node of order m is same as that of B-tree node of order m.  



The root has at least two subtrees unless it is a leaf. Each non-root index node holds q-1 keys and q subtree pointers, where  m / 2 ≤ q ≤ m .

Only difference is that index node do not contain data pointers. P1 Fr

keys
K1

P2

Ki-1

Pi

Ki

Ki-1
Kq-1

Pq

Kq-1
B+-Tree: Sequence Node Structure 

The structure of the B+-tree sequence node is as follows: K1

D1

Ki-1

Di-1

Ki

Di

Kq-1

Dq-1 Pointers to previous and next leaf node in tree

Data pointer

Data pointer Data pointer

Data pointer

Search in a B+-Tree 

The search in B+-tree works similar to B-tree but it always ends at the leaf node because  



Pointer to the data is stored in the leaf node. Existence of the key in index set does not guarantee that the particular record is present in the tree. A key can occur multiple time in the index set (this does not create problem because the key in index set node act only as a separator). Note 62 is not present in the sequence set

30 62

15 21

3

5

15 17

34 45

21 23

30 31

34 37

75 90

45 51

B+-tree of order 5

69 71

75 79

90 94

B+-Tree Insert 

Case: The leaf node has room for the key to be inserted. Find appropriate leaf node for key 3, and insert in order.

Insert 3 3 14 32

5

8

14 19 20 23

32 38

B+-Tree Insert Operation 

Case 2: The leaf in which key is to be placed is full. Find appropriate leaf node for key 16

Insert 16 16

3

5

8

10 19 25

14 19 20 23

No room for key 16, Split node: create a new node and move  m / 2 keys to the new node.

32 38

Insert a copy of the first key of the new node in the parent node in order.

14 16 19

20 23

19 Modify Sequence Set next node links

B+ -Tree Insert Operation 

Case: Only root node exists, and it is full. Find appropriate position for key 18

Insert 18

18 No room for key 18, Split node: create a new sequence set node and move  m / 2 keys to the new node.

18 10 17 20 23

18 Create a new index set node and make it a root node. Insert the first key of the new sequence set node in the new root.

B+-Tree Delete Operation 

B+-tree deletion follows same rules as that of Btree deletion but the separator in index set node is not removed when a key is deleted.



Deletion cases:  

After deletion node is at least half full. After deletion underflow occurs  Redistribute: if number of keys in siblings >  Merge nodes if number of keys in siblings < Merging lead to decrease in tree height.

m  2  − 1 m  2  − 1

. .

B+-Tree Delete 

Case: After deletion node is at least half full.

Search key 14

Delete 14 14 32

3

5

8

14 19 21

Note: key 14 in the parent node is still a valid separator key. It is not modified.

32 38 40

Key found, delete key 14. Move others keys in the node to eliminate the gap.

B+-Tree Delete Operation 

Underflow occurs, evenly redistribute the keys if left or right sibling has keys >  m / 2 − 1. Separator key in the parent is Search key 14 & delete it. no longer valid

Delete 14 14 38 32

5

8

14 19

Insert a copy of the first key of the sibling node in parent in order.

32 38 40

Underflow occurs, evenly redistribute keys in the underflow node and in the sibling node. Note: unlike B-tree, the separator key in the parent node is not included. Why?

B+-Tree Delete Operation 

Underflow occurs and the keys in the left & right sibling are =  m / 2 − 1

. Merge the underflow node and a sibling. Search key 32 & delete it.

Delete 32 14 38

5

8

19 32

38 40

Underflow occurs, merge nodes. Move keys in sibling to underflow node and discard the sibling node.

Efficient Sequential Access in For efficient sequential access, start from the beginning of B+-Tree 

the sequence set and traverse the sequence set using the next pointers in sequence set nodes 45

13

27

33

38

Sequence Set Start

48

52

55

67

57

81

61

72

77

86

92

Comparison B-Tree & B+Tree B-Tree 

 

 

Data pointers are stored in all nodes No redundant keys Search can end at any node Slow sequential access Higher trees

B+-Tree 

Data pointers are stored only in leaf nodes (sequence set)

 

 

Redundant keys may exist Search always ends at leaf node Efficient sequential access Flatter trees (no data pointers in index set nodes)

B* -Trees

B* -Tree -- Variant of BTree 

Each node of a B-tree represents a block of secondary memory, therefore, accessing a node is expensive operation. Thus, the fewer nodes that are created, the better.



In B*-tree is a variant of B-tree introduced by Donald Knuth and named by Douglas Comer.



In a B*-tree, all nodes except the root are required to be at least two-thirds full, not just half full as in B-tree.



The number of keys in all non-root nodes in a B*-tree of order m 2m − 1 is k, where  3  ≤ k ≤ m − 1. 





Average utilization of B*-tree is 81%.

B* -Tree Insert Operation 

In B*-tree, the frequency of node splitting is decreased by delaying a split.



A split in a B*-tree is delayed by attempting to redistribute the keys between node and its sibling when node overflows.



In B*-tree split operation two nodes are split into three instead of one into two as in B-tree.



All three nodes participating in the split are guaranteed to be twothirds full after split.

B*-Tree Insert Operation 

Overflow occurs, evenly redistribute the keys between node and its sibling. Insert 22

22

32

10 12 15 16 21 24 25 29

35 42 47 51 53

Overflow occurs, evenly redistribute keys in the overflow node, in its sibling including the separator key in parent and new key.

B*-Tree Insert Operation 

Overflow occurs, sibling is full, split node.

Insert 72 72

32

10 12 15 16 21 24 25 29

35 42 47 51 53 55 57 59

Overflow occurs, sibling is full, split node

B* -Tree Delete Operation 

B*-tree deletion follows same rules as that of B-tree deletion.



Deletion cases:  After deletion node is at least two third full.  After deletion underflow occurs 2m − 1  Redistribute: if number of keys in siblings > . 3 2m − 1  Merge nodes if number of keys in siblings < . 3



Examples of deletion are omitted as they are similar to B-tree deletion.

Questions ?

References 

Data Structure and Algorithms in C++, Adam Drozek.



Introduction to Algorithms, T.H.Cormen, C.E.Leiserson, R.L.Rivest, and C.Stein.



Fundamentals of Database Systems, Elmasri Navathe.



Fundamentals of Data Structures in C++, E.Horowitz, S.Sahni and D.Mehta



The Ubiquitous B-Tree, DOUGLAS COMER, ACM Computing Surveys (CSUR), Volume 11 ,Issue 2(June 1979).

The End

Related Documents

B Trees Advanced)
November 2019 7
B+ Trees
October 2019 23
Splay Trees Advanced)
November 2019 7
Red-black Trees Advanced)
November 2019 4
Trees
July 2020 23
Trees
October 2019 45