Data Structures with C++ Fakhar Lodhi
Software Design Quality • What is good design? – Is it efficient code? – compact implementation? – Most maintainable? • For most large, long life time software systems, maintenance cost normally exceeds development cost by factors ranging from 2 to 3. • Boehm (1975) quotes a pathological case where the development cost of an avionics system was $30 per line of code but the maintenance cost was $4000 per instruction
Maintainable Design • Cost of system changes is minimal • readily adaptable to modify existing functionality and enhance functionality • design is understandable • changes should be local in effect • Design should be modular
Abstraction • Abstraction is a technique in which we construct a model of an entity based upon its essential characteristics while ignoring the inessential details. • The principle of abstraction also helps in handling the inherent complexity of a system by allowing looking at its important external characteristic, and hiding its inner complexity at the same time. • Hiding the internal details is called encapsulation. • Engineers of all fields, including computer science, have been practicing abstraction for mastering complexity. • Code and Data abstraction
void selectionSort(int a[], int size) { int i, j, min, temp; for(i = 0; i < size –1; i++) { // find minimum from i to size - 1 min = i; for (j = i; j < size; j++) { if (a[j] < a[min]) min = j; } // swap element with the ith location // with the minimum value temp = a[i]; a[i] = a[min]; a[min] = temp; } }
void selectionSort(int a[], int size) { int i, j, min, temp; for(i = 0; i < size –1; i++) { // find minimum from i to size - 1 min = i; for (j = i; j < size; j++) { if (a[j] < a[min]) min = j; } // swap element with the ith location // with the minimum value temp = a[i]; a[i] = a[min]; a[min] = temp; } }
int minimum (int a[ ], int from, int to) { int min = from; for (int i = from; i <= to; i++) if (a[i] < a[min]) min = i; return min; }
void swap (int &x, int &y) { int temp = x; x = y; y = temp; }
void selectionSort(int a[ ], int size) { int i, j, min; for (i=0; i < size-1; i++) { min = minimum (a, i, size-1); swap(a[i], a[min]); } }
int minimum (int a[ ], int from, int to) { int min = from; for (int i = from; i <= to; i++) if (a[i] < a[min]) min = i; return min; }
void swap (int &x, int &y) { int temp = x; x = y; y = temp; }
void selectionSort(int a[ ], int size) { int i, j, min; for (i=0; i < size-1; i++) { min = minimum (a, i, size-1); swap(a[i], a[min]); } }
void selectionSort(int a[], int size) { int i, j, min, temp; for(i = 0; i < size –1; i++) { // find minimum from i to size - 1 min = i; for (j = i; j < size; j++) { if (a[j] < a[min]) min = j; } // swap element with the ith location // with the minimum value temp = a[i]; a[i] = a[min]; a[min] = temp; } }
Data Abstraction and Abstract Data Types (ADT) • A data type is a template for objects and a set of operations that define the behavior of the objects (or instances) of that type. • An Abstract Data Type (ADT) is a data type in which the implementation details are hidden and the user is concerned with the properties (or behavior) of that type. • An ADT has two components: – Interface – the behavior – Implementation
• Example: – int, float
Abstract Data Type • The data structures used to implement the data type can only be accessed through the interface. • Any change in the implementation does not change the user programs as long as the interface remains the same. • This is also known as data encapsulation or data abstraction
Interface 1
Interface 2
Implementation Interface 3
Interface 4
An ADT
Abstraction VS. Implementation • x (01000001)2 •x=? • If x is character then x ‘A’ • If x is integer then x 65
int main(int argc, char* argv[]) { int i, *pi; float f, *pf; i = 1024; pi = &i; pf = (float *) pi; f = *pf; cout << " i = " << i
<< "
f = " << f << "\n \n";
<< "
f = " << f << "\n \n";
f = i; cout << " i = " << i return 0; }
WHY?
Abstraction VS. Implementation Two dimensional array 1
5
3
6
3
2
38
64
22
76
82
99
0
106
345
54
User’s view (abstraction) 1
5
3
6
3
2
38
64
22
76
82
System’s view (implementation)
99
0
106
345
54
ADTs and C++ Classes • A class is used to define (and implement) an ADT in C++. • A class consists of data and functions that operate on that data. • A class in C++ has two parts – public and private (let’s ignore the protected members for now). • The data and functions defined in the class are called the members of the class.
ADTs and C++ Classes • Users of the class can only access and manipulate the class state through the public members of the class. • Private members can only be used by other members of the class (let’s also ignore the friends for now). • Data encapsulation is achieved by declaring all data members of a class to be private. • The interface to the class is provided through the use of public member functions.
Data Structures • The primary objective of programming is to efficiently process the input to generate the desired output. • We can achieve this objective in an efficient and neat style if the input data is organized in a way to help us meet our goal. • Data Structures is nothing but ways and means of organizing data so that it can be processed easily and efficiently. • Data structures dictate the manner in which the data can be processed. In other words, the choice of an algorithm depends upon the underlying data organization.
Problem: Determine if a key is present in a collection of 10 integers Organization 1: Data are stored in 10 disjoint (as opposed to composite) variables: A0, A1, A2, A3, …, A9
Algorithm found = false; if (key = = A0) else if (key = = A1) else if (key = = A2) else if (key = = A3) else if (key = = A4) else if (key = = A5) else if (key = = A6) else if (key = = A7) else if (key = = A8) else if (key = = A9)
found = true; found = true; found = true; found = true; found = true; found = true; found = true; found = true; found = true; found = true;
Problem: Determine if a key is present in a collection of 10 integers
Organization 2: Data are stored in an array A of 10 elements Algorithm found = false; for (int i = 0; i < 10; i++) if (A[i] = = key) { found = true; break; }
• Average number of comparisons in both cases is the same so both algorithms are equally efficient (or in efficient) • Organization 2 is better because it yields an algorithms which is more maintainable. For example, if the collection contains 100 elements. In general, number of elements could be N.
Problem: Determine if a key is present in a collection of 10 integers
Organization 3: Data are stored in an array A in ascending order
Algorithm found = false; low = 0; high = N – 1; while (( ! found) && ( low <= high)) { mid = (low + high)/2; if (A[mid] = = key) else if (A[mid] > key) else }
found = true; high = mid – 1; low = mid + 1;
• Average number of comparisons ? • Order of “log(N)” as compared to N. Animate
found = false; low = 0; high = N – 1; while (( ! found) && ( low <= high)) { mid = (low + high)/2; if (A[mid] = = key) else if (A[mid] > key) else }
5
6
7
8
9
found = true; high = mid – 1; low = mid + 1;
0
1
2
3
4
10 11 12 13 14 15
2
3
5
7
10 12 15 22 28 29 32 47 48 50 55 73 key = 29 Back
Performance Comparison • NADRA database: ~80,000,000 records • Computer which can perform 10,000 comparisons per second – Linear search: – Binary search:
~2.22 hours ~0.005 seconds
– Roughly 1.6 million times less
• Why?
Performance Analysis 1. Does the program efficiently use primary and secondary storage? 2. Is the program’s running time acceptable for the task? 3. Space Complexity: •
The space complexity of a program is the measure of the amount of memory that it needs to run to completion.
4. Time Complexity: •
The time complexity of a program is the measure of the amount of computer time it needs to run to completion.
Performance Estimation • How to determine which algorithm is better? • We need some mechanism to predict the performance without actually executing the program. • Mechanism should be independent of the compiler and underlying hardware.
Step Count • •
Program Step: A program step is a meaningful program segment. We can consider each statement as a single step. a = 2; a = 2 * b + c + 3 * c / d – e;
Step Count To count total number of steps, we must determine: • • • •
Step count for each statement – steps/execution or s/e Frequency of each statement Total steps for each statement Finally sum these counts to get the total step count
Example 1 – Summing of a list of numbers Statement float sum (float list[], int n) { float temp = 0; int i; for (i=0; i
Total
S/ e 0 0 1 0 1 1 1 0
Frequency Total Steps 0 0 0 0 1 1 0 0 n+1 n+1 n n 1 1 0 0 2n+3
Step Count table for summing of a list of numbers
Example 2 – Matrix addition Statement void add(int a[][MAX_ SI ZE], int b[][MAX_ SI ZE], int c[][MAX_ SI ZE], int rows, int cols) { int i,j; for (i=0; i
S/ e 0
Frequency 0
Total Steps 0
0 0 1 1 1
0 0 row+1 rows.(cols+1) rows.cols
0 0 row+1 rows.(cols+1) rows.cols
0
0
0 2.rows.(cols+1)+1
Step Count Table for matrix addition
Big Oh (O) • Determining the exact step count of a program can be a very difficult task • Because of the inexactness of the definition of a step, exact step count is not very useful for comparative purposes. e.g., which one is better 45n+3 or 100n+10 • We use some asymptotic notations as measure of growth.
Big Oh (O) Big Oh is defined as: f(n)=O(g(n)) iff there exists positive constants c and n0 such that f(n) ≤ cg(n) for all values of n ≥ n0 No matter what the value of c1, c2, c3, there will be an n beyond which the program with complexity c3n will be faster than the one with complexity c1n2+c2n Theorem: If f(n) = amnm+….+a1n+a0, then f(n) = O(nm)
Big Oh (O) An estimate of how will the running time grow as a function of the problem size.
Growth Functions log(n)
n
n log(n)
n**2
n**3
2**n
2500
growth
2000 1500 1000 500 0 1
2 3
4
5 6
7 8
9 10 11 12 13 14 15 16 17 18 19 20 n
Big Oh (O) • • • •
Summing of list of numbers O(n) Matrix addition O(rows.cols Searching a key in an array ) Binary Search O(n) O(log n)
Big Oh (O) int search_min(int list[], int from, int to) { int i; int min= from; for (i=from; i<=to; i++) If (list[i] < list[min]) min = i;
return min; }
// O(n)
void swap(int &a, int &b) { int temp=a; a = b; b = temp; }
// O(1)
Big Oh (O) void selection_sort(int list[], int size) { int i,j; for (i=0; i<size-1; i++) { // O(Size) or O(n) j= search_min(list, i+1, size-1); //O(n).O(n)=O(n2)
swap (list[i], list[j]); //O(1).O(n)=O(n) }
// O(n)+O(n2)+O(n)+O(n)=O(n2) }
Big Oh (O) • • • • • • •
O(1) O(log n) O(n) O(n log n) O(n2) O(n3) O(2n) -
constant logarithmic linear log linear quadratic cubic exponential