This document was uploaded by user and they confirmed that they have the permission to share
it. If you are author or own the copyright of this book, please report to us by using this DMCA
report form. Report DMCA
Introduction SQL Database Design Query Optimization Recovery and Concurrency Control Integration of Structured Data and Text Distributed Database Systems
Initially, installations wrote separate applications with large amounts of repeated code to implement concurrency control, security, and recovery.
This was a lot of wasted effort that was also very much error prone
(C) Frieder, Grossman, & Goharian 1996, 2002
4
Example of Common Functionality Payroll
Inventory
Marketing
User Interface Business Logic
User Interface Business Logic
User Interface Business Logic
Concurrency Control
Concurrency Control
Concurrency Control
Security
Security
Security
Recovery
Recovery
Recovery
(C) Frieder, Grossman, & Goharian 1996, 2002
5
Database System Example Payroll
Inventory
Marketing
User Interface Business Logic
User Interface Business Logic
User Interface Business Logic
Database System Concurrency Control Security Recovery (C) Frieder, Grossman, & Goharian 1996, 2002
6
Definitions
DBMS - a Database Management System is a set of routines that is capable of providing the following basic functions: » Add » Delete » Update » Retrieve
(C) Frieder, Grossman, & Goharian 1996, 2002
7
Primitive Functionality
Add (X) = Find (X); If not found then insert (X) else return (error_code)
Delete (X) = Find (X); If found then remove (X) else return (error_code) (C) Frieder, Grossman, & Goharian 1996, 2002
8
Primitive Functionality (cont)
Update (X, Y) = Delete (X); If not error_code then Add (Y)
Retrieve (X) = Delete (X); If not error_code then Add (X)
Posting List (C) Frieder, Grossman, & Goharian 1996, 2002
11
Inverted List Example Hank
Record 1
Record 3
Record 5
Query: find all occurrences of the name (value of attribute) is ‘Hank’ in the database: Hash to the value Hank in the “index” Scan the posting list for all occurrences
(C) Frieder, Grossman, & Goharian 1996, 2002
12
Inverted Index
Associates a posting list with each attribute value abacus: abatement: … zoology: 83: 2002:
Inverted because it lists for each attribute the location on disk where the value is stored. (C) Frieder, Grossman, & Goharian 1996, 2002
13
Building an Inverted Index
For each relation r in the collection » For each attribute t in relation r – Find attribute t in the item dictionary – If term t exists, add a new disk location to its posting list – Otherwise,
Add attribute value t to the item dictionary Add a node to the posting list
(C) Frieder, Grossman, & Goharian 1996, 2002
14
File Organizations
Unsorted files (Heap Files) – Good storage efficiency, fast insertion, deletion. – Slow searches
Sorted Files – Good storage efficiency and search of range – Slow insertion and deletion – not practical (files never sorted) => B+ tree data structure
Hashed-Based Files – –
Not efficient storage Fast insertion, deletion, and equality searches
The initial paper on the relational model was written in 1969 by Codd. System R was a relational prototype implemented in the mid 70’s by IBM. Ingres was a relational prototype implemented at UC Berkeley in the mid 70’s. Finally, commercial offerings of relational systems started with Oracle in 1979 and was quickly followed by SQL/DS and DB2 by IBM in the mid 80’s.
(C) Frieder, Grossman, & Goharian 1996, 2002
21
Data Independence
The relational model allows users to simply specify what data they require, not how to get them. This is referred to as data independence and is a key contribution of the relational model. Older models are referred to as navigational as users must navigate through the data and follow pointers from one datum to another.
(C) Frieder, Grossman, & Goharian 1996, 2002
22
Structure of RDBMS User Query Query Optimizer Operators File Manager Buffer Manager Disk Space Manager
Relational algebra is used to specify the operations allowed within the relational model.
Relational algebra is theoretically based on set theory.
A relation can be illustrated as a table of rows and columns. The table is referred to as a relation, rows are referred to as tuples, and columns are attributes.
Relational operators are closed. A relational operation applied to a relation always results in a relation.
Formal Definition IF R is the set of attributes (columns), commonly referred to as schema, then r(R) is a mapping of a set of tuples (rows ), commonly referred to as instance. Since each attribute is restricted to a limited domain, a relation is actually a subset of: dom(A1) x dom(A2) x dom(A3) where dom(X) indicates the domain or set of valid values for attribute X. (C) Frieder, Grossman, & Goharian 1996, 2002
26
Characteristics of Relations
No inherent ordering of tuples. Conceptually, no inherent ordering of attributes. In practice, attributes are ordered based on the initial schema definition. All attribute values within a tuple should be atomic (1NF).
(C) Frieder, Grossman, & Goharian 1996, 2002
27
Key Attributes
Since a relation is merely just a set of tuples, NO DUPLICATE ELEMENTS are theoretically possible. (Unfortunately, some implementations violate this uniqueness definition) A column or set of columns must uniquely identify a tuple. Superkey - any combination of attributes that uniquely identify a tuple.
(C) Frieder, Grossman, & Goharian 1996, 2002
28
Keys (continued)
Key - a superkey from R such that the removal of any attribute results in a set of attributes that is NOT a superkey. Hence, a key is a: MINIMAL SUPERKEY
Ex: (emp#, name, department, salary) is a superkey but not a key, because name, department, or salary could be removed and we would still have the superkey, emp#.
(C) Frieder, Grossman, & Goharian 1996, 2002
29
Candidate Keys
It is possible that more than one set of attributes qualify as a key. These are called candidate keys. Typically, one is chosen and referred to as the primary key. This key is underlined in the description of the relation. EMPLOYEE(emp#, name, department, salary)
Relations are viewed as sets. Relational operations are closed. Any operation on one or more relations yields a relation. No inherent ordering. Unary Operators: SELECT, PROJECT Binary Operators: UNION, INTERSECTION, SET DIFFERENCE, CARTESIAN PRODUCT, JOIN Single Scan: SELECT, PROJECT including key Multiple Scan: All others (C) Frieder, Grossman, & Goharian 1996, 2002
44
Structured Query Language (SQL)
(C) Frieder, Grossman, & Goharian 1996, 2002
45
Structured Query Language
First relational query language, SQUARE was implemented in System R (1975). SQUARE was followed by SEQUEL. First commercial implementation, Oracle (1979), followed closely by SQL/DS in 1982. Major SQL DBMS vendors: Oracle, IBM (DB2), Sybase, Informix, Computer Associates, and Microsoft.
(C) Frieder, Grossman, & Goharian 1996, 2002
46
SQL Overview
Data Manipulation Language (DML) » » » »
SELECT INSERT UPDATE DELETE
Data Definition Language (DDL) » CREATE TABLE » CREATE INDEX » GRANT
(C) Frieder, Grossman, & Goharian 1996, 2002
47
SELECT Overview
Single Relation » SELECT » Boolean Operators » IN » BETWEEN » Aggregate Operators » Calculated Attributes » Sorting » Wildcard Searches » GROUP BY » HAVING » NULLS » Varchar Multiple Relations
(C) Frieder, Grossman, & Goharian 1996, 2002
48
Syntax: SELECT <list of columns> FROM <list of tables> Ex: SELECT p#,name,qty FROM PARTS Ex: SELECT * FROM PARTS
(C) Frieder, Grossman, & Goharian 1996, 2002
PARTS p#
name
qty
1 2 3
Nut Bolt Wheel
42 25 15
49
Select with WHERE SELECT <list of columns> FROM <list of tables> WHERE <list of conditions> <list of conditions> is of the form: [=,<,>,<>,<=,>=] EX:
SELECT * p# FROM PARTS 3 WHERE p# = 3
(C) Frieder, Grossman, & Goharian 1996, 2002
name Wheel
qty 15
50
Use of Boolean Operators Conditions can be separated by Boolean operators: AND, OR, NOT EX: “List information about parts 1 and 2” SELECT * p# name qty FROM PARTS Nut 42 WHERE p# = 1 OR p# = 2 1 2 Bolt 25 EX: “LIST information about all wheels that contain more than 20 in stock” SELECT * FROM PARTS WHERE name = ‘Wheel’ and qty > 20
(C) Frieder, Grossman, & Goharian 1996, 2002
51
Shortcut Number 1: IN To find information for a list of values, the IN operator may be used: Ex: “List the name of all parts whose part # is 1,2, or 5” SELECT name FROM PARTS WHERE p# IN (1, 2, 5)
Name Nut Bolt
instead of: WHERE (p# = 1) OR (p# = 2) OR (p# = 5) (C) Frieder, Grossman, & Goharian 1996, 2002
52
Shortcut Number 2: BETWEEN To find values within a range, it is often easier to use BETWEEN. Ex: “Find all parts where the quantity on hand is greater than or equal to twenty parts, but less than or equal to fifty.” p# name qty SELECT * 1 Nut 42 FROM PARTS 2 Bolt 25 WHERE qty BETWEEN 20 and 50 instead of: WHERE qty >= 20 AND qty <= 50 (C) Frieder, Grossman, & Goharian 1996, 2002
53
Aggregate Operators A common requirement is to compute statistics such as MIN, MAX, and AVERAGE on a range of data. Ex: Find the min, max, and average quantity of all wheels. SELECT MIN(qty), MAX(qty), AVG(qty) FROM PARTS min(qty) max(qty) avg(qty) WHERE name = ‘Wheel’ 15 15 15
(C) Frieder, Grossman, & Goharian 1996, 2002
54
Calculated Attributes A new attribute is obtained by using arithmetic operators (+,-, *, /) on other numeric attributes. All operators follow standard precedence. Ex: List all parts and their quantity given an increase of 20%
p# 1 SELECT p#,name,(qty+(qty*.2)) ‘qty’ 2 3 FROM PARTS
(C) Frieder, Grossman, & Goharian 1996, 2002
name qty Nut 50 Bolt 30 Wheel 18 55
Sorting ORDER BY <list of attributes> [DESC] can be added to SELECT to obtain sorted output. Ex: List all part names in ascending order: SELECT p#, name, qty FROM PARTS ORDER BY name
p# name 2 Bolt 1 Nut 3 Wheel
qty 25 42 15
For descending order change to: ORDER BY name DESC (C) Frieder, Grossman, & Goharian 1996, 2002
56
Sorting Calculated Attributes To refer to a computed attribute in the ORDER BY, use the position in the list of columns following SELECT. Ex: “List all part information in descending order of a projected 20 percent reduction in quantity” SELECT p#,name,(qty-(qty*.2)) ‘qty’ FROM PARTS ORDER BY 3 DESC (C) Frieder, Grossman, & Goharian 1996, 2002
p# 1 2 3
name qty Nut 34 Bolt 20 Wheel 12 57
Wildcard Searches of Strings The LIKE operator is used to search parts of a string. The following wildcard characters are used: % - zero or more characters _ - exactly one character Ex: List all parts whose name starts with a ‘W’ p# name SELECT * 3 Wheel FROM PARTS WHERE name LIKE ‘W%’ (C) Frieder, Grossman, & Goharian 1996, 2002
qty 15
58
More LIKE Examples Ex: List all parts whose name starts with a ‘W’ and ends with an ‘L’ p# name Wheel WHERE name LIKE ‘W%L’ 3 Ex: List all parts whose name is three characters long and starts with a ‘N’ name WHERE name LIKE ‘N__’ p# 1
(C) Frieder, Grossman, & Goharian 1996, 2002
Nut
qty 15
qty 42
59
Review List all parts whose name starts with a ‘W’ or whose part number is either 2,4,8,11,12,13,14,15. Sort the list in descending order by quantity. SELECT * FROM PARTS WHERE _____ LIKE _______ OR _____ IN (_,_,_) OR ______ BETWEEN __ AND __ ORDER BY ____ DESC (C) Frieder, Grossman, & Goharian 1996, 2002
60
Review (Answer) List all parts whose name starts with a ‘W’ or whose part number is either 2,4,8,11,12,13,14,15. Sort the list in descending order by quantity. SELECT * FROM PARTS WHERE name LIKE ‘W%’ OR p# IN (2,4,8) OR p# BETWEEN 11 AND 15 ORDER BY qty DESC (C) Frieder, Grossman, & Goharian 1996, 2002
61
More Review Ex: List the total number of parts that would exist if quantity was increased by 25 percent.
Ex: List all parts that would have a quantity greater than 50, if the quantity was increased by 25 percent.
(C) Frieder, Grossman, & Goharian 1996, 2002
62
More Review (Answer) Ex: List the total number of parts that would exist if the quantity was increased by 25 percent. SELECT SUM(qty + qty * .25) FROM PARTS Sum(qty + qty * .25) 103
Ex: List all parts that would have a quantity greater than 50, if the quantity was increased by 25 percent. SELECT * p# name 1 Nut FROM PARTS WHERE qty + qty * .25 > 50 (C) Frieder, Grossman, & Goharian 1996, 2002
qty 42
63
GROUP BY It is often necessary to review data about groups of related tuples. Consider an employee relation that contains the “Department” attribute. Assume one employee may work in only a single department. DEPARTMENT partitions the EMP set into subsets: Sales Marketing Service Finance (C) Frieder, Grossman, & Goharian 1996, 2002
64
GROUP BY (continued) EMPLOYEE (emp#, name, salary, department) emp# name salary department 1 2 3
Fred Mike Sam
200 300 400
Sales Sales Sales
4 5
Martha Juanita
350 500
Marketing Marketing
6
Steve
800
Finance
7 8
Tom Sue
200 900
Service Service
(C) Frieder, Grossman, & Goharian 1996, 2002
65
GROUP BY (continued) For each department, list the average salary. SELECT department, AVG(salary) FROM EMPLOYEE GROUP BY department department AVG(salary) Sales 300 Marketing 425 Finance 800 Service 550 (C) Frieder, Grossman, & Goharian 1996, 2002
66
Group By (continued) If a WHERE clause exists, it is executed as well. Ex: For each department, list the highest salary, but exclude all employees whose name starts with a ‘S’ SELECT department, MAX(salary) FROM EMPLOYEE WHERE name NOT LIKE ‘S%’ GROUP BY department (C) Frieder, Grossman, & Goharian 1996, 2002
67
GROUP BY (continued) department Sales Marketing Service
(C) Frieder, Grossman, & Goharian 1996, 2002
max(salary) 300 500 200
68
GROUP BY (continued) More refined groups are obtained by using multiple attributes in GROUP BY. Add the attribute “REGION” to the employee relation. Now the department partition may be partitioned into different regions. North West
MARKETING
East
South
(C) Frieder, Grossman, & Goharian 1996, 2002
69
GROUP BY (continued) EMPLOYEE (emp#, name, salary, department,rgn) emp# name salary department rgn 1 2 3
Fred Mike Sam
200 300 400
Sales Sales Sales
north north east
4 5
Martha Juanita
350 500
Marketing Marketing
west west
6
Steve
800
Finance
south
7 8
Tom Sue
200 900
Service Service
north south
(C) Frieder, Grossman, & Goharian 1996, 2002
70
GROUP BY (multiple partitions) To specify more than one partition, simply add more attributes after the GROUP BY: Ex: Compute the average salary for each region within each department. SELECT department, region, AVG(salary) FROM EMPLOYEE GROUP BY department, region (C) Frieder, Grossman, & Goharian 1996, 2002
71
GROUP BY (continued) department Sales Sales Marketing Finance Service Service
(C) Frieder, Grossman, & Goharian 1996, 2002
reg avg(salary) north 250 east 400 west 425 south 800 north 200 south 900
72
HAVING (restricts groups) Syntax: HAVING (list of conditions) Aggregate functions used (SUM, MIN, MAX, COUNT). Ex: List the average salary for all departments that have more than two employees. SELECT department, AVG(salary) FROM EMPLOYEE department avg(salary) GROUP BY department Sales 300 HAVING COUNT(*) > 2 (C) Frieder, Grossman, & Goharian 1996, 2002
73
HAVING (continued) Ex: List the minimum salary in departments sales, marketing and service as long as the department has an average salary greater than 400. SELECT department, MIN(salary) FROM EMPLOYEE WHERE department IN (‘sales’,’marketing’,’service’) GROUP BY department HAVING AVG(salary) > 400 department min(salary) Marketing 350 (C) Frieder, Grossman, & Goharian 1996, 2002
74
Multiple Entity Relationships Multiple tables needed to store multi-entity relationships. One to many: One parent may have many children Many to one: Many people may attend a single meeting Many to Many: Students graduate from multiple colleges Each college graduates by many students (C) Frieder, Grossman, & Goharian 1996, 2002
75
Single Relation Design It is tempting to try and stuff all multi-valued relationships into a single relation: emp# name salary 1 Fred 200 2 Ethel 300 3
Mike
400
(C) Frieder, Grossman, & Goharian 1996, 2002
college1 Harvard IIT
college2 Michigan
MIT
Stanford
Unused
college3 Unused Unused
IIT
76
Problems with Poor Design
Incompleteness » Unable to store more than three colleges per individual. Many to many relationships can have an infinite number of values. Query Complexity » Queries such as “list all colleges attended by Mike” become substantially more difficult. Wasted Storage » Many entries are not used. (C) Frieder, Grossman, & Goharian 1996, 2002
77
Multiple Relations: 2 Relations for Many-Many EMPLOYEE emp# name salary 1 Fred 200 2
Ethel
300
3
Mike
400
emp# 1
COLLEGE name Harvard
2 2
IIT Michigan
3 3 3
MIT Stanford IIT
emp# in EMPLOYEE is a primary key emp# in COLLEGE is a foreign key emp#,name in COLLEGE is a primary key (C) Frieder, Grossman, & Goharian 1996, 2002
78
Preserving Relationships
Joining the two relations restores the original relation assuming a key is part of the partitioning.
Poor partitioning may result in additional spurious tuples being introduced.
(C) Frieder, Grossman, & Goharian 1996, 2002
79
Equi-Join
Explicit indication of the join attribute conditions. Typically involves joining on foreign key attributes. List all colleges attended by “Mike” SELECT b.name FROM EMPLOYEE a, COLLEGE b WHERE a.emp# = b.emp# AND a.name = ‘Mike’ name MIT Stanford IIT
(C) Frieder, Grossman, & Goharian 1996, 2002
80
Problem with only using Two Relations for Many-Many Suppose we need to maintain the college location as well. location must be replicated many times. EMPLOYEE emp# name salary 1 Fred 200 2
Ethel
300
3
Mike
400
(C) Frieder, Grossman, & Goharian 1996, 2002
COLLEGE emp# name 1 Harvard 2 IIT 2 Michigan 3 MIT 3 Stanford 3 IIT
location Boston Chicago Ann Arbor Boston Stanford Chicago
81
Use of 3 Relations To avoid needless repetition, we create a separate table for each of the two entities involved in a many-many relationship, and then a third “linking” relation to contain data about the relationship between the entities. All 1-1 information about employees: EMPLOYEE(emp#, name, salary) All 1-1 information about colleges: COLLEGE(col#, name, location) All data pertaining to a single employee attending a single college: ATTENDS(emp#, col#, gpa)
(C) Frieder, Grossman, & Goharian 1996, 2002
82
Use of Three Relations (continued) COLLEGE
EMPLOYEE emp# 1 2 3
name Fred Ethel Mike
salary 200 300 400
col# 11 22 33 44 55
name Harvard IIT Michigan MIT Stanford
location Boston Chicago Ann Arbor Boston Stanford
ATTENDS emp# 1 2 2 3 3 3
col# 11 22 33 44 55 22
(C) Frieder, Grossman, & Goharian 1996, 2002
gpa 2.45 3.79 3.65 2.85 2.65 4.0 83
Sample Query (3 Relations) Ex: List the names of all colleges attended by “Mike.” SELECT b.name FROM EMPLOYEE a, COLLEGE b, ATTENDS c WHERE a.emp# = c.emp# AND name b.col# = c.col# AND Michigan MIT a.name = ‘Mike’ Stanford
(C) Frieder, Grossman, & Goharian 1996, 2002
84
Sample Query (3 Relations) Ex: List the names of all employees who attended Harvard. SELECT a.name FROM EMPLOYEE a, COLLEGE b, ATTENDS c WHERE a.emp# = c.emp# AND b.col# = c.col# AND name Fred b.name = ‘Harvard’
(C) Frieder, Grossman, & Goharian 1996, 2002
85
Subqueries Instead of hard-coding the list that is used by IN, it is possible to dynamically generate the list using a subquery. Ex:
SELECT * FROM EMPLOYEE WHERE emp# IN (1,2,3,4,5,7)
could be rewritten as; Ex:
SELECT * FROM EMPLOYEE WHERE emp# IN (SELECT num FROM SAMPLE)
Assuming SAMPLE(num) is a relation with tuples: 1,2,3,4,5, and 7. (C) Frieder, Grossman, & Goharian 1996, 2002
86
EXISTS EXISTS prefaces a subquery and evaluate to TRUE if one or more tuples are present in the result set of the subquery. Ex: List all employees who attended at least one college. SELECT * FROM EMPLOYEE a WHERE EXISTS (SELECT c.emp# FROM ATTENDS c WHERE c.emp# = a.emp#) (C) Frieder, Grossman, & Goharian 1996, 2002
87
DISTINCT DISTINCT is used to remove duplicates. Ex: List all distinct salaries. SELECT DISTINCT(salary) FROM EMPLOYEE
(C) Frieder, Grossman, & Goharian 1996, 2002
distinct (salary) 200 300 400
88
UNION The union of two result sets (of the same data type) is obtained via the UNION operator (duplicates are removed). The OR query can be written using UNION: Syntax: <SQL select statement> UNION <SQL select statement> Ex: Obtain billing from 2000 and 2001. SELECT * FROM BILL2000 UNION SELECT * FROM BILL2001 (C) Frieder, Grossman, & Goharian 1996, 2002
89
EXCEPT Syntax: <SQL select statement> EXCEPT <SQL select statement> Ex: Find employees, who attended IIT but not MIT. SELECT e.emp# FROM EMPLOYEE a, COLLEGE b, ATTENDS c WHERE a.emp# = c.emp# AND b.col# = c.col# AND b.name = ‘IIT’ EXCEPT SELECT e.emp# FROM EMPLOYEE a, COLLEGE b, ATTENDS c WHERE a.emp# = c.emp# AND b.col# = c.col# AND b.name = ‘MIT’ (C) Frieder, Grossman, & Goharian 1996, 2002
90
Other Data Manipulation
UPDATE » Modify tuples in a single relation DELETE » Remove tuples from a single relation INSERT » Add tuples to a single relation
(C) Frieder, Grossman, & Goharian 1996, 2002
91
INSERT Syntax: INSERT INTO
[list of columns] VALUES (<list of values>) Ex. For the relation: EMPLOYEE (emp#, name, salary) INSERT INTO EMPLOYEE (emp#, name, salary) VALUES (5, ‘Herbert’, 200) Note: If optional column list is not found, the values must be listed in the order of their initial definition
(C) Frieder, Grossman, & Goharian 1996, 2002
92
INSERT - Format 2 Syntax: INSERT INTO
(select statement) Ex: Copy all tuples in the EMPLOYEE relation and place them in NEW_EMPLOYEE INSERT INTO NEW_EMPLOYEE SELECT * FROM EMPLOYEE
(C) Frieder, Grossman, & Goharian 1996, 2002
93
UPDATE Syntax: UPDATE
<list of assignments separated by commas> WHERE where is of the form: SET = Ex: Modify John’s salary to 150 UPDATE EMPLOYEE SET salary = 150.00 WHERE name = ‘John’ (C) Frieder, Grossman, & Goharian 1996, 2002
94
UPDATE (continued) An assignment statement may contain a numeric expression: Ex: Give all employees a ten percent raise. UPDATE EMPLOYEE SET salary = salary * 1.10
(C) Frieder, Grossman, & Goharian 1996, 2002
95
DELETE Syntax: DELETE FROM
WHERE Ex: Remove all employees who work in department 5 DELETE FROM EMPLOYEE WHERE dept = 5 To remove all employees: DELETE FROM EMPLOYEE
(C) Frieder, Grossman, & Goharian 1996, 2002
96
Data Definition Language (DDL) » » » » » » »
Create Table Drop Table Create Index Drop Index GRANT REVOKE ALTER TABLE
(C) Frieder, Grossman, & Goharian 1996, 2002
97
CREATE TABLE CREATE TABLE
[], .... []
(
) typical data types are: CHAR(x), VARCHAR(x), SMALLINT, INTEGER, DATE, TIME, DECIMAL (x,y)
Varying Length Character VARCHAR(x) indicates that a string will be no longer than x characters. Fixed length strings are padded to fill fixed space. Varying length strings have a Length Indicator. FIXED 200
Hank FILL 252.35
(C) Frieder, Grossman, & Goharian 1996, 2002
VARYING 200
4 Hank 252.35
100
Effect of Varying Length Columns on Performance FIXED tuple 1 tuple 2 tuple 3 tuple 4 tuple 5
A modification to the FIXED table only affects one tuple. A modification to VARCHAR might result in the reshuffling or copying to OVERFLOW of other tuples so that they fit on a single page. (C) Frieder, Grossman, & Goharian 1996, 2002
101
Rule of Thumb (VARCHAR) Avoid VARCHAR when it is not necessary. One “rule of thumb” is to avoid VARCHAR when the maximum savings is less then thirty characters. Advantage: Save storage Disadvantage: Degrades performance of UPDATE
(C) Frieder, Grossman, & Goharian 1996, 2002
102
Nulls
An attribute may be defined as null. This indicates that the value is unknown and avoids the need for user-defined special indicators. CREATE TABLE EMPLOYEE (emp# SMALLINT, name CHAR(20), salary
DECIMAL(5,2) NULL)
(C) Frieder, Grossman, & Goharian 1996, 2002
103
Effect of Nulls on Performance Any data in a tuple that allows nulls is prefaced by a null indicator.
Null Indicator 200
Hank
252.35 Null Indicator (1 byte)
(C) Frieder, Grossman, & Goharian 1996, 2002
No Null 200
Hank 252.35
104
Effect of Nulls on Performance
A table that specifies about NULLS results in one byte added for each tuple stored in the relation. This can be a tremendous waste of storage if no data are ever NULL. Most DBMS default to allow NULLS, while many real world applications do not require NULLS. Performance of retrieval and update is slightly degraded because the null indicator must be examined before checking tuple content.
(C) Frieder, Grossman, & Goharian 1996, 2002
105
Syntax Modifications for Nulls
Allow NULL specification in INSERT » To add an employee whose salary is unknown: – INSERT INTO EMPLOYEE (3,’Hank’, null) Use of NULL in select. SELECT * FROM EMPLOYEE WHERE salary IS NULL
(C) Frieder, Grossman, & Goharian 1996, 2002
106
CREATE INDEX
The relational model does not specify how data should be accessed. To create a separate access path, SQL allows users to use CREATE INDEX to create a separate structure, called access method. Usually B+tree is used. Ex: CREATE UNIQUE INDEX I1 ON EMPLOYEE (num) SELECT * FROM EMPLOYEE WHERE num = 25 will use a B-tree instead of a sequential scan. (C) Frieder, Grossman, & Goharian 1996, 2002
107
DROP INDEX / TABLE
To remove an index use: DROP INDEX
To remove a table use: DROP TABLE
(C) Frieder, Grossman, & Goharian 1996, 2002
108
Referential Integrity
When a primary key is modified, it is often necessary to delete the corresponding foreign keys. A single employee id might be a foreign key in many tables. Employee
Colleges
(C) Frieder, Grossman, & Goharian 1996, 2002
Projects
Dependents
109
Specification of Primary and Foreign Key EMPLOYEE(emp#, name, salary) and COLLEGE (emp#, col#, col_name) emp# is a primary key in the EMPLOYEE relation and emp# is a foreign key of the COLLEGE relation. CREATE TABLE EMPLOYEE CREATE TABLE COLLEGE (emp# SMALLINT, (emp# SMALLINT, name CHAR(20), col# SMALLINT, salary DECIMAL(5,2), col_name CHAR(20), PRIMARY KEY (emp#)) FOREIGN KEY K1 (emp#) REFERENCES EMPLOYEE ON DELETE CASCADE, PRIMARY KEY (emp#, col#)) (C) Frieder, Grossman, & Goharian 1996, 2002
110
Referential Integrity (continued)
ON DELETE: » [CASCADE, SET NULL, RESTRICT] CASCADE » A delete to a primary key results in a delete of all corresponding tuples that contain the foreign key. SET NULL » A delete to a primary key results in null values placed in all corresponding foreign keys. RESTRICT » A delete to primary key results in an error if a matching foreign key exists.
(C) Frieder, Grossman, & Goharian 1996, 2002
111
Views A view is a logical relation. It is defined as a subset of tuples and attributes of a physical (base) relation. Syntax: CREATE VIEW as <sql select> Ex: Create a view on the EMPLOYEE relation such that the salary attribute is omitted. CREATE VIEW V1 AS (SELECT num, name FROM EMPLOYEE) Now a user may be given access to only V1 without access to the base relation: EMPLOYEE. (C) Frieder, Grossman, & Goharian 1996, 2002
112
Views (continued) The tuples in the base relation may be restricted by adding a WHERE condition to the view definition. EX: Create a view that contains information only about employees in named ‘Steve’ CREATE VIEW V2 as (SELECT * FROM EMPLOYEE WHERE name = ‘Steve’) Any queries against V2 are executed by merging the view definition with the query to ensure the result set only accesses data allowed by the view. (C) Frieder, Grossman, & Goharian 1996, 2002
113
View Insert An insert to a view that does not contain all of the attributes in the base relation results in the additional attributes being set to NULL. This is only valid if nulls are permitted. Ex: Consider the view V1 that omits the SALARY attribute. INSERT INTO V1 (4, ‘Hank’) is equivalent to: INSERT INTO EMPLOYEE (4,’Hank’, null) A null value will be placed in the salary attribute.
(C) Frieder, Grossman, & Goharian 1996, 2002
114
Security and Authorization Access to relations and views is controlled by the GRANT and REVOKE statements. GRANT [ALL, SELECT, INSERT, UPDATE, DELETE] ON