Fundamentals of Computer Engineering 2
Structured Data Types
Bernhard Westfechtel
Summer 2004
Fundamentals of Computer Engineering 2
Summer 2004
Contents of this chapter
Data structures: motivation
Classification of data structures
Declarations of data structures
Accessing and manipulating data structures
Kinds of data structures » Arrays » Records » Pointers
Storage management
Bernhard Westfechtel
2
Fundamentals of Computer Engineering 2
Summer 2004
Motivation
So far, data types have been elementary (integers, reals, booleans, etc.)
However, in many applications data are structured, i.e., elementary data are composed into larger units, e.g., » Student data: first name, last name, student no, address » Address: street, city, ZIP code, country » Mathematics: vectors, matrices » Programs: call stack » Operating systems: job queues
High-level programming languages need to define constructs for » Declaration of data structures » Accessing and manipulating (components of) data structures
Bernhard Westfechtel
3
Fundamentals of Computer Engineering 2
Summer 2004
Classification of data structures
Number of components » Fixed size » Variable size
Types of components » Homogeneous data structure: all components have the same type » Heterogeneous data structure: components are of different types
Organization of components » One-dimensional (e.g., vector) » Multi-dimensional (e.g., matrix)
Bernhard Westfechtel
4
Fundamentals of Computer Engineering 2
Summer 2004
Arrays
An array is a data structure which is characterized as follows » Fixed size » Homogeneous » One or more dimensions Ö Vector: one-dimensional array Ö Matrix: two-dimensional array Ö Multi-dimensional array (more than two dimensions)
Attributes of an array » Number of dimensions » Range of each dimension » Data type of each component
Operations on an array » Component selection: accessing a component via its index » Component assignment: assigning a new value to a selected component
Bernhard Westfechtel
5
Fundamentals of Computer Engineering 2
Summer 2004
General form var variable name : array [l1 : u1, ..., ln : un] of component type variable name[i1, ..., in]
Array declaration
Accessing an array component
The keyword array indicates an array declaration
component type defines the types of components stored in an array
The number and range of indices are enclosed in square brackets [...] » n defines the dimension of the array » ld and ud define lower bounds and upper bounds of indices for dimension d (1 ≤ d ≤ n), i.e., ld ≤ id ≤ ud
Special cases of n » n = 1: vectors (one-dimensional arrays) » n = 2: matrices (two-dimensional arrays)
Bernhard Westfechtel
6
Fundamentals of Computer Engineering 2
Summer 2004
Whole-array operations
Whole-array operations » Operations which operate on arrays as a whole rather than on individual components
Equality check » a1 = a2 ¤ and a2 are declared identically – Component types are equal – Equal number of dimensions – Equal ranges for each dimension Ö All components are equal, i.e., a1[i1, ..., in] = a2[i1, ..., in] for all [i1, ..., in] Ö a1
Assignment » a1 := a2 copies array a2 component-wise into a1 » Same restrictions for the declarations as above
Bernhard Westfechtel
7
Fundamentals of Computer Engineering 2
Summer 2004
Storage representation
v[0]
0
v[1]
1
v[2]
2
v[3]
3
v[4]
4
v[5]
5
v[6]
6
v[7]
7
v[8]
8
v[9]
9
Bernhard Westfechtel
8
Components are stored sequentially in memory
Addressing of component i: addr(v[i]) = b + e*i » b: base address (start) of the vector v » e: size of an element (component)
The following condition must hold: » 0 ≤ i < l (l : length of vector)
Otherwise, the index is out of range
Fundamentals of Computer Engineering 2
Summer 2004
Scalar product function sp(v, w : array [0:9] of real) return real is var result : real; var i : integer; begin result := 0; for i from 0 to 9 do result := result + v[i] * w[i] end; return result end
sp = ∑ vi * wi (0 ≤ i < 10)
Vectors are usually processed with the help of for loops
Bernhard Westfechtel
9
Fundamentals of Computer Engineering 2
Summer 2004
Sorting
Problem: sort a vector of elements (integers) in ascending order » i ≤ j ⇒ v[i] ≤ v[j]
Solution (informal): straight insertion » For each index i (0 < i < l) in ascending order Ö Insert v[i] into v[0..i-1] such that v[0..i] is sorted after the insertion
Example: initial i=1 i=2 i=3 i=4 i=5 i=6 i=7
Bernhard Westfechtel
44 44 12 12 12 12 06 06
55 55 44 42 42 18 12 12
12 12 55 44 44 42 18 18
42 42 42 55 55 44 42 42
94 94 94 94 94 55 44 44
18 18 18 18 18 94 55 55
06 06 06 06 06 06 94 67
67 67 67 67 67 67 67 94
10
Fundamentals of Computer Engineering 2
Summer 2004
Sorting procedure
procedure sort(in out v : array [0:7] of integer) is var i, j, temp : integer; begin for i from 1 to 7 do temp := a[i]; /* Save a[i] */ j := i; while j > 0 and temp < a[j – 1] do a[j] := a[j – 1]; /* Move old element up */ j := j – 1 end; if j < i then a[j] := temp end /* Insert saved value */ end
Bernhard Westfechtel
11
Fundamentals of Computer Engineering 2
Summer 2004
Strings
In many programming languages, strings are represented as vectors of characters, e.g., » var name : array [0:29] of char;
Disadvantages of this solution » Strings have a fixed length » String operations are awkward to program
Our approach » Pre-defined data type string, e.g., Ö var name : string; » Strings may have variable length » Pre-defined operations (see below)
Bernhard Westfechtel
12
Fundamentals of Computer Engineering 2
Summer 2004
String operations Operations
Explanation
Examples s := t; /* Copy t into s */
:=
Assignment
s := "computer"; /*Assign a string constant */ s = "computer science"
=, < , <=, <=, >
Relational operators referring to the lexicographical order
s > "algorithm" s < "computer program"
length
Length of a string
length("computer science") = 16
""
String constant for the empty string (length = 0)
s := "";
&
String concatenation
"computer " & "science" = "computer science"
s[i]
Accessing the i-th character (0 ≤ i ≤ length(s) – 1)
s = "computer science" fi s[0] = "c", s[15] = "e"
Bernhard Westfechtel
13
Fundamentals of Computer Engineering 2
Summer 2004
Matrices
A matrix is a two-dimensional array
A matrix consists of rows and column column j
row i
m[i,j]
Possible storage representations » Row-major order: row by row » Column-major order: column by column
Bernhard Westfechtel
14
Fundamentals of Computer Engineering 2
Summer 2004
Storage representations Matrix m
Row-major order m[0,0]
0
m[0,1]
1
m[0,2]
2
m[0,3]
3
Addressing
m[1,0]
4
m[1,1]
5
m[1,2]
6
m[1,3]
7
m[2,0]
8
m[2,1]
9
m[2,2]
10
m[2,3]
11
Bernhard Westfechtel
Column-major order
0
1
2
3
4
5
6
7
8
9
10
11
Parameters » r: number of rows » c: number of columns » e: size of element » b: base address Row-major order » addr(m[i,j]) = b + e*(c*i + j) Column-major order » addr(m[i,j]) = b + e*(r*j + i)
15
m[0,0]
0
m[1,0]
4
m[2,0]
8
m[0,1]
1
m[1,1]
5
m[2,1]
9
m[0,2]
2
m[1,2]
6
m[2,2]
10
m[0,3]
3
m[1,3]
7
m[2,3]
11
Fundamentals of Computer Engineering 2
Summer 2004
Example: matrix multiplication procedure mp(in m1, m2 : array [0:9, 0:9] of real; out m : array [0:9, 0:9] of real) is var temp : real; var i, j, k : real; begin for i from 0 to 9 do for j from 0 to 9 do temp := 0; for k from 0 to 9 do temp := temp + m1[i, k] * m2[k, j] end; m[i, j] := temp end end end
For the sake of simplicity, we assume square matrices #rows = #columns (= 10)
mij = ∑ m1ik * m2kj (0 ≤ k < 10) (scalar product of row i and column j)
Calculation requires a nested for loop
Bernhard Westfechtel
16
Fundamentals of Computer Engineering 2
Summer 2004
Type declarations
Each programming language comes with a set of pre-defined types (e.g., integer, boolean, char, real)
In addition, it may support user-defined types
Definition of user-defined types » Implicit Ö m : array [0:9, 0:9] of real implicitly defines square matrices of real elements » Explicit with the help of a type declaration Ö type matrix is array [0:9, 0:9] of real explicitly defines a matrix type which has a name Ö The name may be used as an abbreviation of its definition, e.g., var m : matrix
General form of a type declaration » type type name is definition
Benefits » Increased readability » Consistent use
Bernhard Westfechtel
17
Fundamentals of Computer Engineering 2
Summer 2004
Example: matrix multiplication (revisited) type matrix is array [0:9, 0:9] of real; ... procedure mp(in m1, m2 : matrix; out m : matrix) is var temp : real; var i, j, k : real; begin for i from 0 to 9 do for j from 0 to 9 do temp := 0; for k from 0 to 9 do temp := temp + m1[i, k] * m2[k, j] end; m[i, j] := temp end end end
Bernhard Westfechtel
18
Fundamentals of Computer Engineering 2
Summer 2004
Records
A record (or structure) is a data structure which is characterized as follows » Fixed size (apart from variant records, not to be treated here) » Heterogeneous » Named components
Attributes of a record » Number of components » Name of each component » Data type of each component
Operations on a record » Component selection: accessing a component via its name » Component assignment: assigning a new value to a selected component » Whole-record operations: assignment and equality check
Bernhard Westfechtel
19
Fundamentals of Computer Engineering 2
Summer 2004
Record type declaration: general form type type name is record component name : component type ... component name : component type end;
Any type can be used as a component type » Records can have records as components fi nested records
A record type may be used as the component type in an array type declaration
Orthogonal combination of record types and array types
Bernhard Westfechtel
20
Fundamentals of Computer Engineering 2
Summer 2004
Record type declaration: examples type Name is record firstName : string; lastName : string end;
type Student is record name : Name; address : Address; studentNo : integer; studyField : string; semester : integer end;
type Address is record street : string; no : integer; city : string; zipCode : integer end;
Bernhard Westfechtel
type Students is array [1:max] of Student; /* Assuming a suitably defined constant max */
21
Fundamentals of Computer Engineering 2
Summer 2004
Accessing record components var student : Student; var students : Students; var address : Address; ... student.name.firstName := "John"; student.name.lastName := "Smith";
address.street := "Broadway"; address.no := 1567; address.city := "New York"; address.zipCode := 10036;
student.address := address; ... students[1] := student; ...
Bernhard Westfechtel
22
A record component is referenced using the dot notation » variable name . component name denotes the component component name of the variable variable name (which must be defined as a record) In case of nested records, the dot notation is applied repeatedly fi path expressions
Fundamentals of Computer Engineering 2
Summer 2004
Example: points and rectangles type Point is record x, y : integer end; function addpoint(pt1, pt2 : Point) return Point is var pt : Point; begin pt.x := pt1.x + pt2.x; pt.y := pt1.y + pt2.y; return pt end; /* Adds two points by adding their components. */ type Rect is record pt1, pt2 : Point end; /* pt1: lower left corner, pt2: upper right corner */ funtion ptinrect(pt : Point, r : Rect) return boolean is begin return (pt.x >= r.pt1.x and pt.x <= r.pt2.x) and (pt.y >= r.pt1.y and pt.y <= r.pt2.y) end; /* Returns true if the point lies within the rectangle */ pt2 pt pt1 Bernhard Westfechtel
23
Fundamentals of Computer Engineering 2
Summer 2004
Pointers
A pointer is a reference to some data object » Value: address of some data object » Address: location where the pointer is stored in memory Use of pointers » Dynamic data structures (growing and shrinking during program execution) Ö Examples: stacks, trees, etc. » Note Ö A pointer is an elementary data object Ö But it is used to build composite data objects Operations on pointers » Dereferencing: following the reference to the data object » Assignment: assigning a storage address to a pointer » Equality check: checks whether two pointers refer to the same storage address » Memory management (for dynamic data structures) Ö Allocate a new data object and make the pointer refer to it Ö Free the data object referenced by a pointer when it is no longer needed
Bernhard Westfechtel
24
Fundamentals of Computer Engineering 2
Summer 2004
Declaration of pointer types General form type type name is ref referenced type;
Examples type Pinteger is ref integer; /* Pointers referencing integers */
type PAddress is ref Address; /* Pointers to address records */
type PStudents is ref Students; /* Pointers to vectors of students */
Bernhard Westfechtel
25
A pointer type is defined with the keyword ref (reference) The referenced type is the type of the data object to which the pointer refers to It is not allowed to reference a data object of a different type (typed pointers) Any referenced type is permitted
Fundamentals of Computer Engineering 2
Summer 2004
Graphical illustration of pointers
Storage cell for the pointer
Arrow for the pointer
Storage cell for the referenced data object
Each pointer is stored in some memory cell
The memory cell contains the address of the referenced data object
The referenced data object is stored in the memory cell located at the given address
The reference is illustrated graphically by an arrow
Bernhard Westfechtel
26
Fundamentals of Computer Engineering 2
Summer 2004
Example p
10
q
20
r
10
p, q, and r have been declared as pointers to integers: » var p, q, r : Pinteger;
Currently, » p refers to a data object whose current value is 10 » r refers to a different data object whose current value is 10 as well » q refers to a data object whose current value is 20
Bernhard Westfechtel
27
Fundamentals of Computer Engineering 2
Summer 2004
Equality check for pointers p
10
q
20
r
10
General form » p1 = p2 ¤ p1 and p2 refer to the same storage location » p1 = p2 compares the pointers and not the referenced data objects!
Example » p = r is false because p and r refer to different storage locations
Bernhard Westfechtel
28
Fundamentals of Computer Engineering 2
Summer 2004
Assignment for pointers p
10
q
20
r
10
General form » p1 := p2 fi p1 and p2 refer to the same storage location after the assignment fi p1 = p2 holds after the assignment
Example » After p := q, p and q store the same address
Bernhard Westfechtel
29
Fundamentals of Computer Engineering 2
Summer 2004
Dereferencing of pointers p
10
q
20
r
10
Dereferencing: following a pointer to obtain the value of the referenced data object
General form » p-> denotes the data object referenced by the pointer » p = q fi p-> = q->, but the opposite direction does not hold in general
Examples » p-> = r-> (= 10), but p # r » p-> = 10 # q-> = 20
Bernhard Westfechtel
30
Fundamentals of Computer Engineering 2
Summer 2004
Assignment to dereferenced pointers p
10
q
20 p-> := 30
p
10
q
30
General form » p-> := expression fi The data object referenced by p is assigned the value of expression fi The referenced data object rather than the pointer is manipulated fi The assignment has a side effect since it affects all pointers referring to the same storage location
Example » After p-> := 30, q-> = 30
Bernhard Westfechtel
31
Fundamentals of Computer Engineering 2
Summer 2004
Dynamic storage management: the heap
The heap consists of an unordered set of storage chunks (sequence of bytes) The heap is a storage area which is managed by two pre-defined procedures » new(p) allocates a new storage chunk for a data object referenced by p » free(p) disposes the storage chunk referenced by p Heap and stack differ as follows: » Stack Ö Allocation of memory on subprogram call Ö Disposal of memory on subprogram return Ö Managed automatically » Heap Ö Not affected by subprogram calls and returns! Ö Managed manually Problems of heap management » Garbage Ö Storage chunks which cannot be accessed any more » Dangling references Ö Pointers referring to disposed storage chunks
Bernhard Westfechtel
32
Heap
Storage chunk
Fundamentals of Computer Engineering 2
Summer 2004
Allocation and disposal of storage chunks p q r
Dangling reference
new(r); free(p)
p q r
Bernhard Westfechtel
33
Allocation » new(r) allocates storage on the heap » Afterwards, r refers to the new storage chunk Disposal » free(p) allocates the storage chunk referenced by p » Afterwards, p is a dangling reference Ö p “points anywhere”
Fundamentals of Computer Engineering 2
Summer 2004
Garbage program example is var i : integer; procedure f(j : integer) is var p : ref integer; begin new(p); p-> := j; /* See (1) for storage state immediately before returning from the call f(i) */ end; begin i := 5; f(i); /* p is not available any more, thus the data referenced before by p is garbage, see (2). */ ... end
(1)
Global variables i
5
Stack j
5
p
Heap 5
(2)
Global variables i
5
Heap Garbage
Bernhard Westfechtel
34
5
Fundamentals of Computer Engineering 2
Summer 2004
Dangling references program example is var i, j : integer; var p, q : ref integer; begin i := 5; j := 6; new(p); p-> := i; q := p; /* See (1). */ free(p); /* p not needed any more. */ q-> := j; /* Error: dangling reference, see (2). */ end /* This example illustrates a dangling reference, i.e., a pointer to a storage location which is not available any more. */
(1) Global variables i
5
j
6
p q
Heap 5
(2) Global variables i
5
j
6
p q
Heap
Dangling reference Bernhard Westfechtel
35
Fundamentals of Computer Engineering 2
Summer 2004
Dynamic data structure: example
To demonstrate dynamic data structures, one of the most widespread and simplest examples is used: stacks
To avoid confusion: » In the following, we refer to stacks defined by a programmer, i.e., user stacks » In contrast, the system stack is managed by the runtime system of the programming language » User stacks are stored on the system heap!
Bernhard Westfechtel
36
Fundamentals of Computer Engineering 2
Summer 2004
Data structure element
stack
element
element
Variables for the stack » stack pointer to top element
Stack element: record of » number stored number » next pointer to next element
Bernhard Westfechtel
37
number
3
next
number
2
next
number
1
next
nil
Fundamentals of Computer Engineering 2
Summer 2004
The value nil
nil is a special value which may be assigned to any pointer
p := nil fi p “points nowhere”
nil is a defined value, i.e., it may be checked whether p = nil
A pointer p “points anywhere” » Immediately after its declaration and before any assignment to p » After execution of free(p)
p = nil (p “points nowhere”) fi » p may be used in equality checks and assignments » p may not be dereferenced, i.e., p-> is forbidden p “points anywhere” fi » p may not be read, i.e., only assignments to p and calls of new(p) are allowed » In particular, p may not be dereferenced, i.e., p-> is forbidden
Bernhard Westfechtel
38
Fundamentals of Computer Engineering 2
Summer 2004
Data declarations
/* Data declarations for a stack of integers */ type Element is record number : integer; /* Stored number. */ next : Stack /* Pointer to next element. */ end; type Stack is ref Element; /* A stack is represented by a pointer to the top element. */ var stack : Stack; /* A variable for a stack of integers */
Bernhard Westfechtel
39
Fundamentals of Computer Engineering 2
Summer 2004
Functions: informal illustrations isempty(stack)
empty(stack) stack
stack
nil
nil
?
push(stack, 2) number stack
⇒
stack
2
next
number
1
number
1
next
nil
next
nil
top(stack)
number
2
stack
(return number component of top element)
pop(stack) number stack
Bernhard Westfechtel
2
next
⇒
stack
number
1
number
1
next
NULL
next
nil
40
Fundamentals of Computer Engineering 2
Summer 2004
Operations on stacks procedure empty(out s : Stack) is begin s := nil end; /* Create an empty stack by assigning nil to the stack pointer. */ function isempty (s : Stack) return boolean is begin return s = nil end; /* Check whether the stack is empty. */ procedure push(in out s : Stack; n : integer) is var top : Stack; /* Reference to the new element, which is the new top of the stack. */ begin new(top); /* Allocates a new element on the heap. */ top->.number := n; /* Store new number. */ top->.next := s; /* Store pointer to previous top element */ s := top /* The new stack is the reference to the new top element */ end;
Bernhard Westfechtel
41
Fundamentals of Computer Engineering 2
Summer 2004
Operations on stacks (continued) function top(s : Stack) return integer is begin return s->.number end; /* Follow pointer to top element and return number component. Precondition: stack not empty. */ procedure pop(in out s : Stack) is var oldtop : Stack; begin oldtop := s; s := s->.next; free(oldtop) end; /* Adjust pointer to top element to the element below the top element, and free old top element. Precondition: stack not empty. */
Bernhard Westfechtel
42
Fundamentals of Computer Engineering 2
Summer 2004
Literature
Terrence Pratt, Marvin Zelkovitz: Programming Languages: Design and Implementation, Prentice Hall, 2001, Chapter 6
Robert W. Sebesta: Concepts of Programming Languages, Addison-Wesley, 2002, Chapter 6
Bernhard Westfechtel
43