Database design with a relational model Functional dependencies Normalization
Functional Dependencies
Definition:
– An integrity constraint between two sets of attributes
Think of it as a relationship between two sets of attributes Generalizes the concept of a key, where both X and Y represent sets of attributes X→Y – X functionally determines Y – Y is functionally determined by X FD {X → Y}
i.e., when two tuples agree on their values for the X domains, they agree on their values for the Y domains {X} {Y} 1,2,a,0 m 1,2,a,0 m
Functional Dependencies
Legal instances of a relation must satisfy all ICs, including FDs FD are a property of the intension, not the extension, but can be “ruled out” using the extension Primary key is a particular case of a FD (but not all FDs are keys) F+ denotes the closure of F – The set of all functional dependencies implied by a given set of functional dependencies
Functional Dependency
A
B
C
D
…
A1
B1
C1
D1
…
A1
B1
C1
D2
…
A1
B2
C2
D1
…
A2
B1
C3
D1
…
Determinant
AB → C Dependent
Functional Dependency Closure
Set F of Functional Dependencies (given) Relation: – –
EmpProj: SSN, Pnumber, Hours, Ename, Pname, Plocation FDs F:
{SSN → Ename} Pnumber → {Pname, Plocation} {SSN, Pnumber} → Hours}
Closures: – {SSN}+ = {SSN, Ename} – {Pnumber}+ = {Pnumber, Pname, Plocation}
F+ – {SSN, Pnumber}+ = {SSN, Pnumber, Ename, Pname, Plocation, Hours}
Attribute Closure
Determine FDs that can easily be identified (given) To see if a given FD is in the closure of a set F of FDs, use Armstrong’s Axioms to infer additional dependencies Armstrong’s Axioms – 3 rules of inference can generate all FDs:
Reflexivity:
If X ⊇Y, then X → Y
Augmentation:
If X → Y, then XZ → YZ for any Z
Transitivity:
If X → Y and Y → Z, then X → Z
But keep in mind 2 other rules that are useful:
Union:
If X → Y and X → Z, then X → YZ
Decomposition:
If X → YZ, then X → Y and X → Z
Minimal Set
Requirements: – Every dependency has a single attribute for its right side – No X → A in F can be replaced with Y → A where Y is a subset of X and still have dependencies equivalent to F – No dependency can be removed and still have a set of dependency equivalent to F
Definition: – If the right-hand (dependent) side of every FD in S involves just one attribute, and – if the left-hand (determinant) side of every FD in S is irreducible (can’t discard another attribute without making a new set not equivalent to S - aka, changing the closure of S+), and – no FD in S can be discarded without changing the closure of S+, then the set S of FDs is irreducible (minimal).
Example
Given: A → B, ABCD → E, EF → G, EF → H, and ACDF → EG
First rewrite: ACDF → E and ACDF → G Then consider ACDF → G (implied by: A → B, ABCD → E, and EF → G) Delete it (and, in the same way, ACD → E replaces ABCD → E) Minimal Set: A → B, ACD → E, ACDF → G, and ACDF → H. Single attribute on right, Minimize left, Delete redundant FDs
Another example… A = employee number, B = department number, C = manager’s employee number, D = project number for a project unique to a manager, E = department name, F = time specified for a project
Given: A -> BC, B -> E, CD -> EF Determine: Does AD->F hold? A-> BC (given), A->C (decomposition), AD->CD (augmentation), CD->EF (given), AD->EF (transitivity), AD->F (decomposition)
Cover Let S1 and S2 be two sets of FDs. If every FD implied by S1 is implied by S2 (S1+⊆S2+), S2 is a cover for S1. So, if FDs are enforced for S2, they are enforced for S1. If S1 is also a cover for S2, they are equivalent.
From another direction
For P, the following FDs are part of the closure: P# ->Pname, P# -> Color, P# -> Weight, P# -> City The following FDs are not irreducible:
P# -> {Pname, Color} P# -> Weight P# -> City –
{P#, Pname} -> Color P# -> Pname P# -> Weight P# -> City –
(right-hand side is not a single attribute)
(left-hand side can be further reduced without changing closure
P# -> P# P# -> Pname P# -> Color P# -> Weight P# -> City –
(first FD can be dropped without changing closure)
Functional Dependency from the SQL perspective
An SQL consideration: If X functionally determines Y, then Y is functionally dependent on X. SQL Translation: In a legal instance with one or more determinant columns and one dependent columns, if you create a SELECT statement with a WHERE clause containing “=“ for some value in the determinant columns, any tuples returned by the dependent column in your FROM clause must always be the same for all tuples. So the SELECT contains the right side and where or group by contains the left side of a Functional Dependency statement SELECT DependentColumn FROM Table WHERE DeterminantColumn1 = ‘value1’…AND DeterminantColumnN = ‘valueN’
Normalization Product Support Coverage ProductNumber
Date
Time
CR76 CR56 CR74 CR56
05/13/0 2 05/13/0 2 05/13/0 2 07/04/0 2
13:30 12:00 12:00 10:30
EmployeeI D SG5 SG5 SG37 SG5
PhoneNumber
PayRate
Withholding
3651101 3651101 3651102 3651102
12 12 12 18
1.80 1.80 1.80 2.40
ProductNumber,Date -> Time, EmployeeID, PhoneNumber EmployeeID, Date, Time -> ProductNumber PhoneNumber, Date, Time -> EmployeeID, ProductNumber EmployeeID, Date -> PhoneNumber Date -> PayRate PayRate -> Withholding
Product Coverage Support
Illustrate FDs
ProductNumb er
D ate
T ime
ProductNumber,Date -> Time, EmployeeID, PhoneNumber EmployeeID, Date, Time -> ProductNumber PhoneNumber, Date, Time -> EmployeeID, ProductNumber EmployeeID, Date -> PhoneNumber Date -> PayRate PayRate -> Withholding
EmployeeI D
PhoneNumbe r
PayRate
Withholding
Decompose Product Support Coverage
1NF
Pay
Coverage
Product Coverage
Wages
Phone Assignment
2NF
Taxes
3NF
BCNF
Sample Set in First Normal Form Ssn
Name
Department
Rank
Wage
Hours
Bonded
123223666
Amsden
Human Resource
8
10
40
Y
231515368
Skaggs
Management Information Systems
8
10
30
131243560
Sider
Marketing & Sales
5
7
30
434263751
Gallow
Marketing & Sales
5
7
32
612674134
Moore
Marketing & Sales
8
10
40
Normal Forms
The key, the whole key, and nothing but the key Assumptions: – FDs exist – A designated primary key can be found Remember, a key is a FD, but a FD is not always a key A property of a relation schema indicating the type of redundancy in the relation schema The goal is to remove redundancy based on dependencies The desire is to minimize additional integrity constraints
Normal Forms
Considerations: – Relational design by analysis – Normal forms are based on functional dependencies (FDs) – Intuitive, perhaps, but identifying a strictly controlled procedure allows a programmatic process – Should consider 2 additional properties Lossless join (nonadditive join property) – required
Dependency preservation property – use when possible
Lossless Joins and Dependency Preservation
If relation R and FDs F hold over R, then decomposing R into R1 and R2 is lossless if the closure of F contains either: – FD R1 ∩ R2 -> R1 or – FD R1 ∩ R2 -> R2
If the closure of the attributes in R1, independent of those attributes in R2, unioned with the closure of attributes of R2, independent of those attributes in R1, are equivalent to the closure F, then dependency is preserved
2NF to 3NF
PN
PM
D
AD3111
20
A00
MA2100
10
D11
OP1000
10
D11
IF1000
20
A00
PN
PM
AD3111
20
IF1000
20
MA2100
10
OP1000
10
PM
D
20
A00
10
D11
Sample Set Ssn
Name
Department
Rank
Wage
Hours
Bonded
123223666
Amsden
Human Resource
8
10
40
Y
231515368
Skaggs
Management Information Systems
8
10
30
131243560
Sider
Marketing & Sales
5
7
30
434263751
Gallow
Marketing & Sales
5
7
32
612674134
Moore
Marketing & Sales
8
10
40
Normal Forms
The key, the whole key, and nothing but the key Assumptions: – FDs exist – A designated primary key can be found Remember, a key is a FD, but a FD is not always a key A property of a relation schema indicating the type of redundancy in the relation schema The goal is to remove redundancy based on dependencies The desire is to minimize additional integrity constraints
Normal Forms
Considerations: – Relational design by analysis – Normal forms are based on functional dependencies (FDs) – Intuitive, perhaps, but identifying a strictly controlled procedure allows a programmatic process – Should consider 2 additional properties Lossless join (nonadditive join property) – required
Dependency preservation property – use when possible
Lossless Joins and Dependency Preservation
If relation R and FDs F hold over R, then decomposing R into R1 and R2 is lossless if the closure of F contains either: – FD R1 ∩ R2 -> R1 or – FD R1 ∩ R2 -> R2
If the closure of the attributes in R1, independent of those attributes in R2, unioned with the closure of attributes of R2, independent of those attributes in R1, are equivalent to the closure F, then dependency is preserved
Normal Forms
Definitions: – Superkey –
R={A1,A2,…,An} A set of attributes S ⊆ R No 2 tuples (t1,t2) in any legal relation state r of R has t1[S]=t2[S]
– Key –
A superkey that is disqualified by any attribute being removed, & Must be minimal (can’t remove another attribute without disqualifying as key of R)
– Candidate Key –
A set of attributes in a relation satisfying key definition
– Primary Key –
Arbitrarily designated
– Primary Attribute –
A member of a candidate key
Levels of Normalization
BCNF 3 NF 2 NF 1 NF
BCNF table EmployeeNumb er 100
Employee
HireDate
Salary
Mauser
DepartmentNumb er E21
6/19/1980
26150
110
Roberts
A00
5/16/1958
38170
120
O’Connell
A00
12/5/1963
37950
130
Clark
C01
7/28/1971
26150
140
Ross
C01
12/15/1976
35420
150
Adamson
D11
2/12/1972
30280
160
Clark
D11
10/11/1977
26150
290
Parker
D31
5/30/1980
15340
310
Skaggs
D31
9/12/1964
15900
First Normal Form (1NF)
Requirements: – 1NF disallows multivalued attributes, or composite attributes, or their combinations, by requiring only single atomic (indivisible) values in the domain of an attribute – In other words, no “relation within relations”
Business Rules Example 1NF
Staffing hours (S) are on a per project activity (activities within projects) basis - AN Managers (PM) and their departments (D) are assigned to projects (PN) – –
A department is assigned to a project managers A project manager is assigned to projects
PN
AN
PM
D
S
AD3111
20
20
A00
0.8
AD3111
30
20
A00
1.5
AD3111
40
20
A00
1
AD3111
50
20
A00
1.25
AD3111
60
20
A00
0.75
AD3111
70
20
A00
0.35
MA2100
20
10
D11
0.5
MA2100
30
10
D11
1
OP1000
30
10
D11
0.25
IF1000
30
20
A00
1
IF1000
50
20
A00
0.5
IF1000
60
20
A00
0.5
Second Normal Form (2NF)
2NF requires full functional dependency on the whole key If FD X→Y, removal of A eliminates the FD For any attribute A∈X, (X-{A})→Y If A can be removed and FD remains, X→Y is a partial functional dependency (a violation of 2NF)
\
2NF violations
Partial dependencies – X is a subset of some key K
K
X
A
1NF
Staffing is on a per project activity ( and activities within projects) basis Managers and their departments are assigned to projects
PN
AN
PM
D
S
AD3111
20
20
A00
0.8
AD3111
30
20
A00
AD3111
40
20
AD3111
50
AD3111
{PN,AN→STAFF} PN →{PM,DN}
2NF
PN
AN
AD3111
20
STAF F 0.8
1.5
AD3111
30
1.5
A00
1
AD3111
40
1
20
A00
1.25
AD3111
50
1.25
60
20
A00
0.75
AD3111
60
.75
AD3111
70
20
A00
0.35
AD3111
70
.35
MA2100
20
10
D11
0.5
MA2100
30
10
D11
1
OP1000
30
10
D11
0.25
IF1000
30
20
A00
1
IF1000
50
20
A00
0.5
IF1000
60
20
A00
0.5
PN
PM
D
AD3111
20
MA210 0 OP1000
10
A0 0 D1
IF1000
20
10
1 D1 1 A0 0
MA2100 20
.5
MA2100 30
1
OP1000
30
0.25
IF1000
30
1
IF1000
50
.5
IF1000
60
.5
Third Normal Form (3NF)
3NF requires elimination of transitive dependency of nonprime (non-key) attributes on another nonprime (non-key) attribute (i.e., dependency is directly on the primary key) {X→Y,Y→Z}⊨X→Z – If a functional dependency X→A holds in R, one of these must be true:
X is a superkey of R, or A ∈X (i.e., each attribute in A - X is part of some candidate key for R)
3NF violations
Transitive dependencies – X is not a subset of the key
K
X
A X
2NF
A department is assigned to a project manager A project manager is assigned to projects
PN
PM D
AD311 1 MA210
20
0 OP100 0 IF1000
10 10 20
PN→PM →D
A0 0 D1 1 D1 1 A0 0 PM D 20 10
A0 0 D1 1
3NF
PN
PM
AD311 1 MA210
20
0 OP100 0 IF1000
10
10 20