Sets And Maps

  • May 2020
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Sets And Maps as PDF for free.

More details

  • Words: 5,242
  • Pages: 84
Bilgisayar Mühendisliği Bölümü

DATA STRUCTURES AND ALGORITHMS Lecture Notes 11 Sets and Maps Spring 2008

GIT – Computer Engineering Department

Chapter Outline  The Set and Map containers and how to use them  Hash coding and its use in efficient search & retrieval  Two forms of hash tables: – Open addressing – Chaining – Their relative benefits and performance tradeoffs

 Implementing both hash table forms  Introduction to implementation of Maps and Sets  Applying Maps and Sets to previous problems GIT – Computer Engineering Department

2

Standard Library Containers

GIT – Computer Engineering Department

3

Associative Containers

GIT – Computer Engineering Department

4

The Set Abstraction  A set is a collection containing no duplicate elements  Operations on sets include: – Testing for membership – Adding elements – Removing elements – Union – Intersection – Difference – Subset GIT – Computer Engineering Department

5

Set Implementation  set is a template class in standard library with template parameters – Key_Type : type of the items – Compare : function class to order items

 Comparison is not required for set abstraction – Compare has default value

GIT – Computer Engineering Department

6

The set class Function

Behavior

Construct a set from the sequence of objects represented by the iterator range first .. last. iterator begin() Returns an iterator to the first item in the const iterator begin() const set. iterator end() Returns an iterator to one past the last item const iterator end() const in the set. bool empty() Returns true if the set is empty. size_t size() Returns the number of item in the set. pair insert(const Inserts an item into the set. If the item is not Key_Type& item) in the set, the iterator will reference the inserted item and the bool will be true. If the item is already in the set the iterator will reference the item currently in the set and the bool will be false. template void Inserts the items from the range first .. insert(II first, II last) last into the set. Duplicates are ignored. void erase(const Key_Type& item) Removes the item from the set. iterator find(const Key_Type& Returns an iterator to the item in the set. If template set(II first, II last)

GIT – Computer Engineering Department

7

Implementing the Union Operator /** Construct the union of two sets. */ template std::set operator+( const std::set& left, const std::set& right){ typename std::set result(left); result.insert(right.begin(), right.end()); return result; }

GIT – Computer Engineering Department

8

Implementing the Set Difference /** Construct the difference of two sets. */ template std::set operator-( const std::set& left, const std::set& right) { typename std::set result(left); for (typename std::set::const_iterator itr = right.begin(); itr != right.end(); ++itr) result.erase(*itr); return result; }

GIT – Computer Engineering Department

9

Implementing Intersection /** Construct the intersection of two sets */ template std::set operator*(const std::set& left, const std::set& right) { return left - (left - right); }

GIT – Computer Engineering Department

10

Overloaded ostream extraction operator // Overloading the ostream insertion operator template std::ostream& operator<<(std::ostream& out, const std::set& a_set) { out << "{"; bool first = true; for (typename std::set::const_iterator itr = a_set.begin(); itr != a_set.end(); ++itr) { if (first) out << *itr; else out << ", " << *itr; first = false; } return out << "}"; } GIT – Computer Engineering Department

11

Set Functions in Function

Behavior

template OI set_difference(II1 first1, II1 last1, II2 first2, II2 last2, OI result, Compare less)

Forms the set difference of the elements from the range first1 .. last1 and first2 .. last2. The result is stored in the sequence beginning with result. The end of the result sequence is returned. Forms the set intersection of the elements from the range first1 .. last1 and first2 .. last2. The result is stored in the sequence beginning with result. The end of the result sequence is returned. Forms the set union of the elements from the range first1 .. last1 and first2 .. last2. The result is stored in the sequence beginning with result. The end of the result sequence is returned.

template OI set_intersection(II1 first1, II1 last1, II2 first2, II2 last2, OI result, Compare less) template OI set_union(II1 first1, II1 last1, II2 first2, II2 last2, OI result, Compare less)

GIT – Computer Engineering Department

12

Set Example int main() { set<string> set<string> set<string> set<string> set<string>

set1; set2; set_u; set_d; set_i;

string data1[] = {"Apples", "Oranges", "Pineapples"}; string data2[] = {"Peaches", "Apples", "Grapes"}; set1.insert(data1, set2.insert(data2, cout << "set1 is " cout << "set2 is "

data1+3); data2+3); << set1 << endl; << set2 << endl;

set_union(set1.begin(), set1.end(), set2.begin(), set2.end(), inserter(set_u, set_u.begin())); cout << "set1 + set2 is " << set_u << endl;

GIT – Computer Engineering Department

13

Set Example (2) set_difference(set1.begin(), set1.end(), set2.begin(), set2.end(), inserter(set_d, set_d. begin())); cout << "set1 - set2 is " << set_d << endl; set_intersection(set1.begin(), set1.end(), set2.begin(), set2.end(), inserter(set_i, set_i. begin())); cout << "set1 * set2 is " << set_i << endl; bool is_member = (set1.find(string("Apples")) != set1.end()); cout << "\"Apples\" is an element of set1 is " << boolalpha << is_member << endl; return 0; }

GIT – Computer Engineering Department

14

vector vs. set  Sets allow no duplicates  Sets do not have positions, so no operator[]  Set iterator can produces elements in sorted order  How to implement a set???

GIT – Computer Engineering Department

15

The multiset  The multiset is the same as the set except that duplicate entries are allowed.  The insert always inserts a new item.  The erase removes all occurrences of an item.  The functions lower_bound and upper_bound may be used to select a range from a set. – In a multiset these functions can also return all occurrences of a given value

 How to implement a multiset?? GIT – Computer Engineering Department

16

Example of a multiset int count_occurences(const multiset<string>& words_set, const string& target) { multiset<string>::const_iterator first_itr = words_set.lower_bound(target); cout << "*first_itr == " << (first_itr != words_set.end() ? *first_itr : "end()") << endl; multiset<string>::const_iterator last_itr = words_set.upper_bound(target); cout << "*last_itr == " << (last_itr != words_set.end() ? *last_itr : "end()") << endl; int count = 0; for (multiset<string>::const_iterator itr = first_itr; itr != last_itr; ++itr) ++count; return count; } GIT – Computer Engineering Department

17

The std::pair  The std::pair is defined in  It is a struct (data members are public) that contains two data items: first and second.  There is a template function make_pair that can be used to construct pair objects.

GIT – Computer Engineering Department

18

Definition of pair template struct pair { Type1 first; Type2 second; pair(const Type1& x, const Type2& y): first(x), second(y) {} pair() : first(Type1()), second(Type2()) {} template pair(const pair& other) { first = other.first; second = other.second; } }; template pair make_pair(const Type1& x, const Type2& y) {return pair(x, y);}

GIT – Computer Engineering Department

19

Maps and the Map Interface  Map is related to Set: it is a set of ordered pairs  Ordered pair: (key, value) – In a given map, there are no duplicate keys – Values may appear more than once

 Can think of key as “mapping to” a particular value  Maps support efficient organization of information in tables  Mathematically, these maps are: – Many-to-one (not necessarily one-to-one) – Onto (every value in the map has a key)

GIT – Computer Engineering Department

20

Map Picture

GIT – Computer Engineering Department

21

The map functions  Template parameters – Key_Type: The type of the keys – Value_Type: The type of the values – Compare: The function class that compares the

keys.

 All functions defined for the set are defined for the map, taking a pair – pair

 If set is defined map can be defined easily  Except the index operator: the index operator (operator[]) is also defined for the map GIT – Computer Engineering Department

22

The map operator[]  The map can be used like an array, except that the Key_Type is the index. – Example: map<string, string> a_map; a_map["J"] = "Jane"; a_map["B"] = "Bill"; a_map["S"] = "Sam"; a_map["B1"] = "Bob"; a_map["B2"] = "Bill";

 Note: – If a mapping exists, assignment will replace it. – If a mapping does not exist, a reference will create one with a default value GIT – Computer Engineering Department

23

EX: Using a map to build an index  Index the words in a text with their line numbers – Use a map (map<string, list>) – Each word (string) is a key – List of line numbers (list) is a value

GIT – Computer Engineering Department

24

Using a map to build an index typedef map<string, list > map_type; void build_index(istream& in, map_type& index) { string next_line; // Each data line int line_num = 0; // Line number // Keep reading lines until done while (getline(in, next_line)) { line_num++; String_Tokenizer tokenizer(next_line, " ,.:-!?/%\'\""); // Insert each token in the index while (tokenizer.has_more_tokens()) { string word = tokenizer.next_token(); to_lower(word); index[word].push_back(line_num); } } }

GIT – Computer Engineering Department

25

Maps  How to implement Maps? – What is the efficiency?

GIT – Computer Engineering Department

26

Chapter Outline  The Set and Map containers and how to use them  Hash coding and its use in efficient search & retrieval  Two forms of hash tables: – Open addressing – Chaining – Their relative benefits and performance tradeoffs

 Implementing both hash table forms  Introduction to implementation of Maps and Sets  Applying Maps and Sets to previous problems GIT – Computer Engineering Department

27

Hash Tables  Goal: access item given its key (not its position)  Therefore, want to locate it directly from the key  In other words, we wish to avoid much searching  Hash tables provide this capability – Constant time in the average case! – Linear time in the worst case

O(1) O(n)

 Searching an array: O(n) Searching BST: O(log n)

GIT – Computer Engineering Department

28

Hash Codes  Suppose we have a table of size N  A hash code is a number in the range 0 to N-1  We compute the hash code from the key – a “default position” when inserting – a “position hint” when looking up

 A hash function is a way of computing a hash code  Desire: The set of keys should spread evenly over the N values  When two keys have the same hash code: collision GIT – Computer Engineering Department

29

EX: Character Frequencies in a Text  To build a Huffman tree, we need the number of occurrences of each character  Hash key: character  Hash value: character frequency  Assume text includes ASCII characters – There are 128 characters in ASCII

 Hash table size is 128 – Character ASCII value is the index in the table • HashTable[65] is frequency of character A

 Hash function: returns the character value  What if the characters are Unicode GIT – Computer Engineering Department

30

EX: Character Frequencies in a Text  There are 216 possible characters, but ... – Maybe only 100 or so occur in a text

 Choosing table size of 216 results in waste of space  Approach: hash characters to range 0-199 – That is, use a hash table of size 200

 A possible hash function for this example: int hash = uni_char % 200;

 Collisions are certainly possible (see later) – Mapping 216 possible characters to 200 indices

GIT – Computer Engineering Department

31

EX: An ideal hash table  Each key is mapped to a different index ! – Not always possible • many keys, finite indices

 Even distribution Considerations :  Devising a hash function  Decide on table size  Decide what to do when collision

GIT – Computer Engineering Department

32

Devising Hash Functions  Simple functions often produce many collisions – ... but complex functions may not be good either!

 It is often an empirical process  If keys are integers – hash function : key % table.size() • Ex: TableSize = 10 Keys = 120, 330, 1000

– TableSize should be prime • Provides even distribution GIT – Computer Engineering Department

33

Devising Hash Functions For strings:  Adding letter values in a string – same hash for strings with same letters in different order – If TableSize is large and number of characters is small • TableSize = 10000 & number of characters in a key = 8 127*8=1016 < 10000

 Use all characters ∑ 32i Key [KeySize -i -1 ] – Early characters does not count • Use only some number of characters • Use characters in odd spaces

 Use first three characters 729*key[2] + 27*key[1] + key[0] – If the keys are not random some part of the table is not used

GIT – Computer Engineering Department

34

Devising Hash Functions For strings:  Better approach: size_t hash = 0; for (size_t i = 0; i < s.size(); ++i) hash = hash * 31 + s[i]; hash = has % table.size()

 This is the hash function used by Java in its String class

GIT – Computer Engineering Department

35

Devising Hash Functions  The String hash is good in that: – Every letter affects the value – The order of the letters affects the value – Hash function distributes the hash values well over the integers • Probability of two strings having the same hash value is quite small • The hash function can be assumed to produce random hash values

 The hash value may not be unique – Too many possible strings

 Collision is still possible GIT – Computer Engineering Department

36

Devising Hash Functions  Guidelines for good hash functions: – Spread values evenly: as if “random” – Cheap to compute

 Generally, number of possible values much greater than table size GIT – Computer Engineering Department

37

Collision Problem  Will consider two ways to organize hash tables – Open addressing – Chaining

GIT – Computer Engineering Department

38

Open Addressing    

Hashed items are in a single array Hash code gives position “hint” Handle collisions by checking multiple positions Each check is called a probe of the table If collision  try an alternate cell h0(x), h1(x), h2(x), … hi(x) = (hash(x) + F(i)) % table.size() F(0) = 0

GIT – Computer Engineering Department

39

Linear Probing 

Probe by incrementing the index –



F(i) = i

If “fall off end”, wrap around to the beginning –

Take care, not to cycle forever!

1. Compute index as hash_fcn() % table.size() 2. if table[index] == NULL, item is not in the table 3. if table[index] matches item, found item (done) 4. Increment index circularly and go to 2  Why must we probe repeatedly? – –

hashCode may produce collisions remainder by table.size may produce collisions GIT – Computer Engineering Department

40

Search Termination Ways to obtain proper termination – Stop when you come back to your starting point – Stop after probing N slots, where N is table size – Stop when you reach the bottom the second time – Ensure table never full • Reallocate when occupancy exceeds threshold

GIT – Computer Engineering Department

41

Hash Table Example      

Table of strings, initial size 5 Add “Tom”, hash 84274  4 Add “Dick”, hash 2129869  4 Add “Harry”, hash 69496448  3 Add “Sam”, hash 82879  4 Add “Pete”, hash 2484038  3

Slot 4 Slot 0 (wraps) Slot 3 Slot 1 (wraps) Slot 2 (wraps)

 Note: many lookups will probe a lot!  Size 11 gives these slots: 3, 5, 10, 56, 7

GIT – Computer Engineering Department

42

Hash Table Example Insert keys {89, 18, 49, 58, 69}  When 49 is inserted collision occurs – Put into the next available index 0  58 collides with 18, 89, and 49 – Inserted at index 1

GIT – Computer Engineering Department

43

Hash Table Considerations  Cannot traverse a hash table – Order of stored values is arbitrary – Can use an iterator to produce in arbitrary order

 When item is deleted, cannot just set its entry to null – Doing so would break probing – Must store a “dummy value” instead – Deleted items waste space and reduce efficiency • Need to go through dummy values during search and even during insertion

– Operations take O(n) time in the worst case

 Higher occupancy causes more collisions GIT – Computer Engineering Department

44

Reducing Collisions By Growing  Choose a new larger size, e.g., doubling  Similar to reallocating a vector – But, elements can move around in reinsertion • Simple copy is not possible

 (Re)insert non-deleted items into new array  Install the new array and drop the old  Rehashing distributes items at least as well  Deleted items (dummy values) are not inserted GIT – Computer Engineering Department

45

Rehashing Example  If 23 is inserted, the table is over 70 percent full.



A new table is created 17 is the first prime twice as large as the old one; so Hnew (X) = X mod 17 GIT – Computer Engineering Department

46

Rehashing  Rehashing is an expensive operation – Running time is O(N)

 Rehashing frees the programmer from worrying about table size  Amortized Analysis: Average over N operations – Operations take: O(1) time

GIT – Computer Engineering Department

47

Quadratic Probing  Linear probing – Tends to form long clusters of keys in the table – This causes longer search chains

 Ex:

GIT – Computer Engineering Department

48

Quadratic Probing Quadratic probing can reduce the effect of primary clustering  Index increments form a quadratic series – F(i) = i2

 Hash values: s, s+12, s+22, s+32, s+42, etc. – (all % TableSize)

 Direct calculation involves multiply, add, remainder – Incremental calculation better Initially: size_t index = ... 1st probe slot ... int k = -1; At each iteration: k += 2; index = (index + k) % table.size(); GIT – Computer Engineering Department

49

Quadratic Probing  When 49 collides with 89, next position attemped is one cell away  58 collides at position 8. The cell one away is tried, another collision occurs. It is inserted into the cell 22=4 away

GIT – Computer Engineering Department

50

Quadratic Probing  Probe sequence may not produce all table slots – A loop around full cells may happen – Hash table not full but empty space not found

 Theorem : If the table size is prime and more than half of the table is empty new element can always be inserted.  Problem : Secondary clustering!... GIT – Computer Engineering Department

51

Double Hashing Double Hashing solves the clustering problem  Use second hash function  F(i) = i * hash2(x)  Poor example : hash2(x) = x mod 9 hash1(x) = x mod 10 TableSize = 10 If x = 99 what happens ? hash2(x) ≠ 0 for any x GIT – Computer Engineering Department

52

Double Hashing  Good choice : hash2(x) = R – (x mod R) R is a prime and < TableSize

GIT – Computer Engineering Department

53

Double Hashing

hash2(x) = 7 – (X mod 7) GIT – Computer Engineering Department

54

Chaining  Alternative to open addressing  Each table slot references a linked list – List contains all items that hash to that slot – The linked list is often called a bucket – So sometimes called bucket hashing

 Examines only items with same hash code – As opposed to open addressing (search chains may overlap)

 Insertion about as complex  Deletion is simpler  Linked list can become long  rehash GIT – Computer Engineering Department

55

Chaining Picture

Two items hashed to bucket 3 Three items hashed to bucket 4

GIT – Computer Engineering Department

56

Performance of Hash Tables  Load factor = # filled cells / table size – Between 0 and 1

 Load factor has greatest effect on performance  Lower load factor  better performance – Reduce collisions in sparsely populated tables

GIT – Computer Engineering Department

57

Performance of Hash Tables  For open addressing, linear probing – Knuth gives expected # probes

1 1 I & U ⇒ 1 + 2  (1 − λ )2 1 1 S ⇒ 1 + 2  (1 − λ ) – As approaches 1

‫ג‬

   

   

• Number of probes increases • insertions might fail

– Rehashing with larger TableSize • if ‫ > ג‬0.5 • if insertion fails GIT – Computer Engineering Department

58

Performance of Hash Tables  For chaining – ‫ ג‬is the avarage length of a list – Successful Find  ‫ג‬/2 comparisons + time to evaluate hash function – Unsuccessful Find & Insert  ‫ ג‬comparisons + time to evaluate hash function

 Good choice ‫ ~ ג‬1  Here ‫ ג‬can be greater than 1 Disadvantage of separate chaining is allocate/deallocate memory ! GIT – Computer Engineering Department

59

Performance of Hash Tables

L 0 0.25 0.5 0.75 0.83 0.9 0.95

GIT – Computer Engineering Department

Number of Probes Linear Probing

Chaining

1.00 1.17 1.50 2.50 3.38 5.50 10.50

1.00 1.13 1.25 1.38 1.43 1.45 1.48

60

Random Collision Resolution  Performance of double hashing is better than linear probing  Assume: Random collision resolution – Probes are independent – No clustering problem  Unsuccessful search and Insert – Number of probes until an empty cell is found (1- ‫ = )ג‬fraction of cells that are empty 1 / (1- ‫ = )ג‬expected number of probes until an empty cell  Successful search P(X)=Number of probes when the element X is inserted 1/N∑ P(X) approximately

1

λ

λ ∫0

1 1 1 dx = ln 1− x λ 1− λ

GIT – Computer Engineering Department

61

Performance of Hash Tables (3)  Hash table: – Insert: average O(1) – Search: average O(1)

 Sorted array: – Insert: average O(n) – Search: average O(log n)

 Binary Search Tree: – Insert: average O(log n) – Search: average O(log n)

 But balanced trees can guarantee O(log n) GIT – Computer Engineering Department

62

Implementing Hash Tables  Class hash_map: used for both implementations  typedef std::pair Entry_Type;

 File Hash_Table_Open.h: implements open addressing  Class Hash_Table_Chain: implements chaining  Further implementation concerns

GIT – Computer Engineering Department

63

Class hash_map Function

Behavior

iterator begin() const_iterator begin() const iterator end() const_iterator end() const bool empty() size_t size() pair insert(const Entry_Type& entry>

Returns an iterator to the first entry in the map. Returns an iterator to one past the last entry in the map. Returns true if the map is empty. Returns the number of entries in the map. Inserts an entry into the map. If the key is not in the map, the returned iterator references the inserted entry, and the bool will be true. If the key is currently in the map, the returned iterator references the existing entry, and the bool will be false. Removes the item from the map. Returns an iterator that references the item in the map. If not present, end() is returned. Returns a reference to the value mapped to the key. If the key is not currently in the map, a default value is inserted, and a reference to the inserted item is returned.

void erase(const Key_Type& key) iterator find(const Key_Type& key) Value_Type& operator[](const Key_Type& key)

GIT – Computer Engineering Department

64

Hash_Table_Open Data Fields and Private Functions Data Field

Attribute

hash hash_fcn

The function object that will compute the hash function for the Key_Type. The number of keys in the table, excluding deleted keys. The number of deleted keys. The vector to hold the hash table. The maximum load factor. The initial capacity

size_t num_keys size_t num_deletes std::vector<Entry_Type*> the_table const double LOAD_THRESHOLD static const size_t INITIAL_CAPACITY const Entry_Type dummy const Entry_Type* DELETED

A dummy entry to represent deleted keys A pointer to the dummy entry.

Function

Behavior

size_t locate(const Key_Type& key)

Returns the index of the specified key if present in the table; otherwise; returns the index of the first free slot. Doubles the capacity of the table and permanently removes deleted items.

void rehash()

GIT – Computer Engineering Department

65

The hash template class  The hash template class is a function class that implements the hash function for its template parameter.  It is defined as follows: template struct hash { size_t operator()(const Key_Type&); };

 We will describe some implementations later

GIT – Computer Engineering Department

66

The locate function size_t locate(const Key_Type& key) { size_t index = hash_fcn(key) % the_table.size (); while (the_table[index] != NULL && (the_table[index] == DELETED || the_table[index]->first != key)) index = (index + 1) % the_table.size (); return index; }

GIT – Computer Engineering Department

67

The insert function std::pair insert(const Entry_Type& entry) { double load_factor = double(num_keys + num_deletes) / the_table.size(); if (load_factor > LOAD_THRESHOLD){ rehash(); // Double the size of the table. } // Find the position in the table. size_t index = locate(entry.first); // See whether it is empty. if (the_table[index] == NULL) { // Create a new entry. the_table[index] = new Entry_Type(entry); num_keys++; return std::make_pair(iterator(this, index), true); } else { // Item is already in the table. return std::make_pair(iterator(this, index), false); } }

GIT – Computer Engineering Department

68

The index operator Value_Type& operator[](const Key_Type& key) { // Try to insert a dummy item. std::pair ret = insert(Entry_Type(key, Value_Type())); // Return a reference to the value found or inserted. return ret.first->second; }

GIT – Computer Engineering Department

69

The rehash function void rehash() { // Create a new table whose size is double the current table. std::vector<Entry_Type*> other_table(the_table.size() * 2, NULL); // Swap this table with the current table. the_table.swap(other_table); // Reinsert all items from old table to new. num_deletes = 0; for (size_t i = 0; i < other_table.size(); i++) { if ((other_table[i] != NULL) && (other_table[i] != DELETED)) { size_t index = locate(other_table[i]->first); the_table[index] = other_table[i]; } } }

GIT – Computer Engineering Department

70

The erase function Algorithm for erase 1. Find the first table element that is empty or that contains the key. 2. if an empty element is found 3. done 4. else 5. Delete the Entry_Type object pointed to. 6. Set the pointer in this entry to DELETED 7. Increment num_deletes 8. Decrement num_keys Implementation is left as an exercise GIT – Computer Engineering Department

71

Copy Constructor, etc.  Because the vector<Entry_Type*> contains pointers to dynamically created objects, we must define the copy constructor, assignment operator, and destructor. hash_map(const hash_map& other) : hash_fcn(hash()), num_keys(0), the_table(other.the_table.size(), NULL), LOAD_THRESHOLD(0.75), num_deletes(0) { for (size_t i = 0; i < other.the_table.size(); i++) { if (other.the_table[i] != NULL && other.the_table[i] != DELETED) insert(Entry_Type(other.the_table[i]->first, other.the_table[i]->second)); } }

GIT – Computer Engineering Department

72

Hash_Table_Chain.h Data Field

Attribute

hash hash_fcn size_t num_keys std::vector<std::list<Entry_Type> > the_buckets static const size_t INITIAL_CAPACITY static double LOAD_THRESHOLD

The hash function object. The number of keys A vector of lists containing the items

GIT – Computer Engineering Department

The initial capacity The maximum load threshold before rehashing.

73

The insert function std::pair insert(const Entry_Type& entry) { // Check for the need to rehash. double load_factor = double(num_keys) / the_buckets.size(); if (load_factor > LOAD_THRESHOLD){rehash();} // Find the position in the table. size_t index = hash_fcn(entry.first) % the_buckets.size(); // Search for the key. typename std::list<Entry_Type>::iterator pos = the_buckets[index].begin(); while (pos != the_buckets[index].end() && pos->first != entry.first) ++pos; if (pos == the_buckets[index].end()) { // Not in table the_buckets[index].push_back(Entry_Type(entry)); num_keys++; return std::make_pair(iterator(this, index, --(the_buckets[index].end())), true); } else { // Already there return std::make_pair(iterator(this, index, pos), false); } } GIT – Computer Engineering Department

74

Copy Constructor, etc.  Since Hash_Table_Chain.h uses a std::vector<std::list<Entry_Type> > to hold the entries: – The vector and list both define the copy constructor, etc. to make a deep copy and to clean up any dynamically allocated objects when destroyed.

 Therefore, no special implementation is required for Hash_Table_Chain.h

GIT – Computer Engineering Department

75

Implementation Considerations  The hash template class defines the hash function, but provides no implementation.  The user of the hash_map (either Hash_Table_Open.h or Hash_Table_Chain.h) must define a specialization for the Key_Type.  Example for string: template<> struct hash<std::string> { size_t operator()(const std::string& s) { size_t result = 0; for (size_t i = 0; i < s.length(); i++) { result = result * 31 + s[i]; } return result; } };

GIT – Computer Engineering Department

76

Specialization for int  One possibility for int is to merely return the integer value, cast to size_t. – This does not produce a “random” distribution of keys.

 An alternative is to multiply by some large prime and take the result modulo the computer’s word size. template<> struct hash { size_t operator()(int i) { return size_t(4262999287U * i); } };

GIT – Computer Engineering Department

77

Defining Your Own hash function  The hash_map class implementations use the hash function to locate the initial search position.  It then they use the objects equality operator (operator ==) to determine if there is a match.  Therefore, your hash function should obey the following constraint: – If obj1 == obj2 then hash()(obj1) == hash()(obj2)

GIT – Computer Engineering Department

78

Applying Maps: Phone Directory string Phone_Directory::add_or_change_entry( const string& name, const string& number) { string old_number = the_directory[name]; the_directory[name] = number; modified = true; return old_number; } string Phone_Directory::lookup_entry(const string& name) const { const_iterator itr = the_directory.find(name); if (itr != the_directory.end()) return itr->second; else return ""; } GIT – Computer Engineering Department

79

Applying Maps: Phone Directory (2) string Phone_Directory::remove_entry(const string& name) { string old_number = the_directory[name]; the_directory.erase(name); modified = old_number != string(); return old_number; }

GIT – Computer Engineering Department

80

Applying Maps: Huffman Coding Algorithm for build_frequency_table 1. while there are more characters in the input file 2. Read a character 3. Increment the entry in the map associated with this character 4. for each entry in the map 5. 6.

Store its data as the weight-symbol pair in the vector Return the vector

GIT – Computer Engineering Department

81

The build_frequency_table function vector Huffman_Tree::build_frequency_table(istream& in) { map frequencies; char c; while (in.get(c)) { frequencies[c]++; } vector result; for (map::iterator itr = frequencies.begin(); itr != frequencies.end(); ++itr) { result.push_back(Huff_Data(itr->second, itr->first)); } return result; }

GIT – Computer Engineering Department

82

build_code 

Additional data field: std::map code_map;



Starter function void build_code() { code_map.clear(); build_code(huff_tree, Bit_String()); }



Recursive function void Huffman_Tree::build_code(const Binary_Tree& tree, const Bit_String& code) { if (tree.is_leaf()) { Huff_Data datum = tree.get_data(); code_map[datum.symbol] = code; } else { Bit_String left_code(code); left_code.append(false); build_code(tree.get_left_subtree(), left_code); Bit_String right_code(code); right_code.append(true); build_code(tree.get_right_subtree(), right_code); } }

GIT – Computer Engineering Department

83

encode void Huffman_Tree::encode(std::istream& in, std::ostream& out) { Bit_String result; char next_char; while (in.get(next_char)) { result += code_map[next_char]; } result.write(out); }

GIT – Computer Engineering Department

84

Related Documents

Sets And Maps
May 2020 2
Sets
May 2020 20
Sets
April 2020 25
Sets
August 2019 51
Maps
October 2019 89
Maps
November 2019 59