IKI 10100I: Data Structures & Binary Heap, Priority Queues Algorithms & Huffman Coding
Ruli Manurung (acknowledgments to Denny & Ade Azurat)
Fasilkom UI
Ruli Manurung (Fasilkom UI)
IKI10100I: Data Structures &
Week 13
1
Review Complete binary tree: the tree is completely filled, with possible exception of the bottom level, which is filled from left to right.
A B
C
D H
F
E I
G
J
Array Implementation: A B C D E F G H I J 0
1
Ruli Manurung (Fasilkom UI)
2
3
4
5
6
7
8
IKI10100I: Data Structures &
9 10 11 12 Week 13
2
Outline Priority Queue Definition Operations Binary Heap Operations Properties Representation Heap Sort
Ruli Manurung (Fasilkom UI)
IKI10100I: Data Structures &
Week 13
3
A Priority Queue A special kind of queue that features slightly different rules: Enqueue operation doesn’t always add new items to the rear of the queue. Instead, it insert each new item into the queue as determined by its priority. Remember analogy of queuing for tickets, and giving priority to pregnant women and the elderly! Essentially, PQ is an Abstract Data Type where access is restricted to the minimum item only.
Ruli Manurung (Fasilkom UI)
IKI10100I: Data Structures &
Week 13
4
Binary Heap Tree-based data structure where access is restricted to the minimum item only. Basic operations: Find the minimum item in constant time Insert a new item in logarithmic worst-case time Delete the minimum item in logarithmic worstcase time
Ruli Manurung (Fasilkom UI)
IKI10100I: Data Structures &
Week 13
5
Properties Structural Property Data is stored in a complete binary tree • The tree is balanced, so all operations are guaranteed O(log n) worst case • Can be implemented as array or list
Ordering Property Heap Order: • Parent ≤ Child
P X
Ruli Manurung (Fasilkom UI)
IKI10100I: Data Structures &
P≤ X
Week 13
6
Which Of These Are Binary Heaps? 1
1
2 2
1
2 3
2
2
2
3 13
1 3
21
2 2
Ruli Manurung (Fasilkom UI)
3
6 65
16 31
19
68
26 32
IKI10100I: Data Structures &
Week 13
7
Heap Representation • Root at -1 location 0 • Left child of i 0 1 is at location 43 3 2 3 2i + 1 • Right child of i 65 58 40 42 4 is at location 2i + 2 = 2(i + 1) • Parent of i is at location (i -1 0 1 43 3 3 2 65 58 40 421) 4/2 0
1
2
3
Ruli Manurung (Fasilkom UI)
4
5
6
7
8
9 10 11 12 13 14
IKI10100I: Data Structures &
Week 13
8
Find Minimum How to access the minimum element? Simple! Return root. Constant time operation 13 21 26 65
16 31
26
Ruli Manurung (Fasilkom UI)
19
68
32
IKI10100I: Data Structures &
Week 13
9
Insertion Create a “hole” in the next available location. Where? Bottom level, rightmost leaf. Move it up towards root to maintain heap order This is called “percolating up”
Ruli Manurung (Fasilkom UI)
IKI10100I: Data Structures &
Week 13
10
Insertion Insert 2 (Percolate Up) -1 0
1
43 65
3
5 58
40
42
8 2
4
-1 0 1 43 5 3 8 65 58 40 42 4 2 0
1
2
Ruli Manurung (Fasilkom UI)
3
4
5
6
7
8
9 10 11 12 13 14
IKI10100I: Data Structures &
Week 13
11
Insertion Insert 2 (Percolate Up) -1 0
1
43 65
2
5 58
40
42
8 3
4
-1 0 1 43 5 2 3 8 65 58 40 42 4 3 0
1
2
Ruli Manurung (Fasilkom UI)
3
4
5
6
7
8
9 10 11 12 13 14
IKI10100I: Data Structures &
Week 13
12
Insertion Insert 14
13 21 24 65
16 19
31 26
Ruli Manurung (Fasilkom UI)
32
68
14
IKI10100I: Data Structures &
Week 13
13
Insertion Insert 14
13 21 24 65
16 19
14 26
Ruli Manurung (Fasilkom UI)
32
68
31
IKI10100I: Data Structures &
Week 13
14
Insertion Insert 14
13 14 24 65
16 19
21 26
Ruli Manurung (Fasilkom UI)
32
68
31
IKI10100I: Data Structures &
Week 13
15
Delete Minimum Only the root can be deleted. Why? Access is restricted to minimum item! Replace “hole” in the root with former last item. “Percolate down” towards leaf to maintain heap order Compare with smallest child, recurse.
Ruli Manurung (Fasilkom UI)
IKI10100I: Data Structures &
Week 13
16
Delete Minimum Percolate Down
-1 0
1
43 65
3
3 58
40
42
2
4
-1 0 1 43 3 3 2 65 58 40 42 4 0
1
2
3
Ruli Manurung (Fasilkom UI)
4
5
6
7
8
9 10 11 12 13 14
IKI10100I: Data Structures &
Week 13
17
Delete Minimum
4 0
1
43 65
3
3 58
40
2
42
4 0 1 43 3 3 2 65 58 40 42 0
1
2
3
Ruli Manurung (Fasilkom UI)
4
5
6
7
8
9 10 11 12 13 14
IKI10100I: Data Structures &
Week 13
18
Delete Minimum: Completed
0 3
1
43 65
3
4 58
40
2
42
0 3 1 43 4 3 2 65 58 40 42 0
1
2
3
Ruli Manurung (Fasilkom UI)
4
5
6
7
8
9 10 11 12 13 14
IKI10100I: Data Structures &
Week 13
19
Delete Min (Alternative)
13 14 19 65
16 19
21 26
Ruli Manurung (Fasilkom UI)
32
68
31
IKI10100I: Data Structures &
Week 13
20
Delete Min (Alternative) Percolate Down
14 19 65
16 19
21 26
Ruli Manurung (Fasilkom UI)
32
68
31
IKI10100I: Data Structures &
Week 13
21
Delete Min (Alternative)
14 16 19 65
19
21 26
Ruli Manurung (Fasilkom UI)
32
68
31
IKI10100I: Data Structures &
Week 13
22
Delete Min (Alternative)
14 19
16 19
21 65
26
Ruli Manurung (Fasilkom UI)
32
68
31
IKI10100I: Data Structures &
Week 13
23
Delete Min (Alternative)
14 19 26 65
Ruli Manurung (Fasilkom UI)
16 19
21 32
68
31
IKI10100I: Data Structures &
Week 13
24
Delete Min (Alternative) Instead of lots of swapping, simply store value at the beginning, move “hole” around, and plug value in at the end. 14 19 26 65
16 21
31
Ruli Manurung (Fasilkom UI)
19
68
32
IKI10100I: Data Structures &
Week 13
25
Heap Operation sequence: Insert: 40, 20, 5, 55, 76, 31, 3 Delete Min Delete Min Insert: 10, 22 Delete Min Delete Min
Ruli Manurung (Fasilkom UI)
IKI10100I: Data Structures &
Week 13
26
Heap Class Definition & Constructor public class MyHeap implements PriorityQueue { // Array implementation private int[] data; int currentSize; // Constructor public MyHeap(int maxSize) { data = new int[maxSize]; currentSize = 0; } } Ruli Manurung (Fasilkom UI)
IKI10100I: Data Structures &
Week 13
27
Heap Indexing Methods // Returns index of an element’s parent private int parentOf(int i) {return (i - 1) / 2;} // Returns index of an element’s left child private int leftChildOf(int i) {return 2 * i + 1;} // Returns index of an element’s right child private int rightChildOf(int i) {return 2 * i + 2;} // Returns minimum value (i.e. heap root), a.k.a. findMin public int peek() {return data[0];} Ruli Manurung (Fasilkom UI)
IKI10100I: Data Structures &
Week 13
28
Removal & Insertion public int poll() // a.k.a. remove, dequeue { int minVal = peek(); data[0] = data[--currentSize]; if(currentSize > 1) percolateDown(0); return minVal; } Implementati on? public void add(int value) // a.k.a. offer { data[currentSize++]=value; percolateUp(currentSize-1); } (Actually, still need to check if polling empty array, adding to full array…) Ruli Manurung (Fasilkom UI)
IKI10100I: Data Structures &
Week 13
29
Percolate Up protected void percolateUp (int leaf) { int parent = parentOf(leaf); int value = data[leaf]; while(leaf>0 && value
Ruli Manurung (Fasilkom UI)
IKI10100I: Data Structures &
Week 13
30
Percolate Down private void percolateDown (int hole) { int value = data[hole]; while(leftChildOf(hole) < size()) { int child = leftChildOf(hole); if(rightChildOf(hole) < size() && data[rightChildOf(hole)] < data[child]) child = rightChildOf(hole); if(data[child] < value) { data[hole] = data[child]; hole = child; } } data[hole] = value; }
Ruli Manurung (Fasilkom UI)
IKI10100I: Data Structures &
Week 13
31
Fix Heap / Heapify The fixHeap operation takes a complete tree that does not have heap order and reinstates it. N insertions could be done in O(n log n) But we can make a fixHeap that takes O(n) time!
92 47
21
20 61
12 17
55
Ruli Manurung (Fasilkom UI)
45 37
25
63 64
IKI10100I: Data Structures &
83
73
Week 13
32
fixHeap Algorithm Inefficient version: fixHeap left subtree fixHeap right subtree percolateDown root However, we don’t need the recursive calls! Why? If we call percolateDown in reverse level order from the bottom (nonleaves) upwards, by the time we call percolateDown on node n, it’s children are already guaranteed to be heaps!
n
92 47 20
21 12
45
63
61 17 55 37 25 64 83 73
Ruli Manurung (Fasilkom UI)
IKI10100I: Data Structures &
Week 13
33
Fix Heap / Heapify Call percolateDown(6)
92 47
21
20 61
12 17
Ruli Manurung (Fasilkom UI)
55
45 37
25
IKI10100I: Data Structures &
63 64
83
73
Week 13
34
Fix Heap / Heapify Call percolateDown(5)
92 47
21
20 61
12 17
Ruli Manurung (Fasilkom UI)
55
45 25 37
25 45
IKI10100I: Data Structures &
63 64
83
73
Week 13
35
Fix Heap / Heapify Call percolateDown(4)
92 47
21
20 61
12 17
Ruli Manurung (Fasilkom UI)
55
25 37
45
IKI10100I: Data Structures &
63 64
83
73
Week 13
36
Fix Heap / Heapify Call percolateDown(3)
92 47
21
20 17 61
12 17 20
Ruli Manurung (Fasilkom UI)
55
25 37
45
IKI10100I: Data Structures &
63 64
83
73
Week 13
37
Fix Heap / Heapify Call percolateDown(2)
92 47
21
17 61
12 20
Ruli Manurung (Fasilkom UI)
55
25 37
45
IKI10100I: Data Structures &
63 64
83
73
Week 13
38
Fix Heap / Heapify Call percolateDown(1)
92 47 12
21
17 61
12 37 20
Ruli Manurung (Fasilkom UI)
55
25 37 47
45
IKI10100I: Data Structures &
63 64
83
73
Week 13
39
Fix Heap / Heapify Call percolateDown(0)
92 12 12 17
21
17 20 61
37 20 92
Ruli Manurung (Fasilkom UI)
55
25 47
45
IKI10100I: Data Structures &
63 64
83
73
Week 13
40
Max Heap Heaps can also store maximum item (instead of min).
97 53
59
26 16
58
41 21
31
36
97 53 59 26 41 58 31 16 21 36 0
1
2
Ruli Manurung (Fasilkom UI)
3
4
5
6
7
8
9 10 11 12 13
IKI10100I: Data Structures &
Week 13
41
Heap after the first deleteMax
59 53 26 16
58 36
41
31
97
21
59 53 58 26 41 36 31 16 21 97 0
1
2
Ruli Manurung (Fasilkom UI)
3
4
5
6
7
8
9 10 11 12 13
IKI10100I: Data Structures &
Week 13
42
Heap after second deleteMax
58 53 26
21
41
59
16
36 31
97
58 53 36 26 41 21 31 16 59 97 0
1
2
Ruli Manurung (Fasilkom UI)
3
4
5
6
7
8
9 10 11 12 13
IKI10100I: Data Structures &
Week 13
43
Heap Sort 1. 2.
Create a heap tree Remove the root elements from the heap one at a time, recomposing the heap
Ruli Manurung (Fasilkom UI)
IKI10100I: Data Structures &
Week 13
44
Summary so far Priority queue can be implemented using binary heap Properties of binary heap structural property: complete binary tree ordering property: Parent ≤ Child Operations of binary heap insertion: O(log n) find min: O(1) delete min: O(log n) heapify: O(n) Binary heap can be used for sorting Ruli Manurung (Fasilkom UI)
IKI10100I: Data Structures &
Week 13
45
Data Compression Process: Encoding: raw → compressed Decoding: compressed → raw Types of compression Lossy: MP3, JPG Lossless: ZIP, GZ Compression Algorithm: RLE: Run Length Encoding Lempel-Ziv Huffman Encoding Performance of compression depends on file types. Ruli Manurung (Fasilkom UI)
IKI10100I: Data Structures &
Week 13
46
Huffman Compression “If a woodchuck could chuck wood!” 32 char × 8 bit = 256 bits 13 distinct characters → 4 bit Compressed code: 128 bits Variable length string of bits to further improve compression. Using prefix codes Main idea: Frequently occurring letters: short representation. Infrequent letters: long representations.
Ruli Manurung (Fasilkom UI)
IKI10100I: Data Structures &
Week 13
47
Huffman Encoding: Comparison
a
8
i
5
u
Ruli Manurung (Fasilkom UI)
3
e
3
a = 00
→ 16 bits
i = 01
→ 10 bits
u = 10
→ 6 bits
e = 11
→ 6 bits
Total : 42 bits
IKI10100I: Data Structures &
Week 13
48
Huffman Encoding: Comparison
19 11 6
a
8
i
5
Ruli Manurung (Fasilkom UI)
u
3
3
a=0 → 8 bits i = 10 → 10 bits u = 110 → 9 bits e = 111 → 9 bits Total: 36 bits
e
IKI10100I: Data Structures &
Week 13
49
Huffman Encoding 1
0
0
1
0
1
0 0
1
1
0
u
! 0
1
a
l
Ruli Manurung (Fasilkom UI)
d
0 k
1
0 1
1 c
0
w 0 I
1 0
1
space o
1 h f
IKI10100I: Data Structures &
Week 13
50
Huffman Encoding 32 19
13 9
7
10
6 4
4
3 2 u:3
k:2 w:2 2
!:1 a:1
c:5 space:5 o:5
d:3
l:1
Ruli Manurung (Fasilkom UI)
h:2
I:1 f:1
IKI10100I: Data Structures &
Week 13
51
Huffman Encoding (freq) ! = a = l = u = d = k = w=
0000 (1) 00010 (1) 00011 (1) 001 (3) 010 (3) 0110 (2) 0111 (2)
I = 10000 (1) f = 10001 (1) h = 1001 (2) c = 101 (5) space= 110 (5) o = 111 (5)
Cost: ∑ di * fi = 111 bits = 44% × 256 bits
Ruli Manurung (Fasilkom UI)
IKI10100I: Data Structures &
Week 13
52
Huffman Encoding: steps
o
5 5
c
5
u
3
3
d
Ruli Manurung (Fasilkom UI)
k
2
w
2
2
h
I
2
1
1
f
IKI10100I: Data Structures &
1
!
a
1
l
Week 13
1
53
Huffman Encoding: steps
o
5 5
c
5
u
3
3
d
k
2
w
2
2
h
2
a Ruli Manurung (Fasilkom UI)
2
IKI10100I: Data Structures &
l
I
1
1
f Week 13
1
! 54
Huffman Encoding: steps
o
5 5
c
5
u
3
3
d
k
3
2
w
2
2
h
2
2
I a Ruli Manurung (Fasilkom UI)
IKI10100I: Data Structures &
l
f
1
! Week 13
55
Huffman Encoding: steps
o
5 5
c
5
u
3
3
3
d
4 2
I f Ruli Manurung (Fasilkom UI)
!
a
l
IKI10100I: Data Structures &
k
2
2
h
w
2
Week 13
56
Huffman Encoding: steps
o
5 5 5 4 c
3 4
h
w I f
Ruli Manurung (Fasilkom UI)
u
3
3
2
d
!
k a
IKI10100I: Data Structures &
2
l Week 13
57
Huffman Encoding: steps
o
5 5 5 4 c
h
6
3
4
w
u
Ruli Manurung (Fasilkom UI)
l
3
d
I
k a
3
f
!
IKI10100I: Data Structures &
Week 13
58
Huffman Encoding: steps
6
u
d o
5 5
c
3
4
h Ruli Manurung (Fasilkom UI)
4
5
w
I
k a
l
IKI10100I: Data Structures &
f
! Week 13
59
Huffman Encoding: steps
6
u
d o
7
5 5
c
5 4
h Ruli Manurung (Fasilkom UI)
w
3
4
I
k a
l
IKI10100I: Data Structures &
f
! Week 13
60
Huffman Encoding: steps
7
I
k a
l
f
!
9
6
u
d
o
5
5
c 5
4
h Ruli Manurung (Fasilkom UI)
IKI10100I: Data Structures &
w Week 13
61
Huffman Encoding: steps
9
c 7
w
h
I
k a
l
Ruli Manurung (Fasilkom UI)
f
!
10
6
u
IKI10100I: Data Structures &
d
o
5
5
Week 13
62
Huffman Encoding: steps
10 9
o
13 7
c h
w I
k a Ruli Manurung (Fasilkom UI)
l
IKI10100I: Data Structures &
f
6
u
! Week 13
d 63
Huffman Encoding: steps
13
19 10
o I
k a
l
f
Ruli Manurung (Fasilkom UI)
!
u
d
IKI10100I: Data Structures &
9
c h
Week 13
w
64
Huffman Encoding: steps
32 19
o
13
c h
w
k a
Ruli Manurung (Fasilkom UI)
I
l
f
IKI10100I: Data Structures &
u
! Week 13
d
65
Huffman Encoding: steps
Total: 111 bits
o
c h
w
k a
Ruli Manurung (Fasilkom UI)
I
l
f
IKI10100I: Data Structures &
u
! Week 13
d
66
How do we implement this? Use a priority queue! Maintain a forest of trees Each element in the queue is a root node Order them by character frequency count Algorithm: Add all unique characters as single node tree Call dequeue twice, merge trees, enqueue result Repeat until only 1 tree left!
Ruli Manurung (Fasilkom UI)
IKI10100I: Data Structures &
Week 13
67
Summary Huffman encoding uses character frequency information to compress a file. The most frequent character gets a shorter prefix code, and vice versa. Priority queue can be implemented using binary heap structural property: complete binary tree ordering property: Parent ≤ Child Operations of binary heap insertion: O(log n) find min: O(1) delete min: O(log n) heapify: O(n) Ruli Manurung (Fasilkom UI)
IKI10100I: Data Structures &
Week 13
68