Normal Is At Ion

May 2020
PDF

Download

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA

Overview

Download & View Normal Is At Ion as PDF for free.

More details

Words: 3,081
Pages: 12

Preview
Full text

Normalisation

The Normalisation Anomalies

process,

Functional

dependency

and

The normalisation process, as first proposed by Codd (1972), takes a relation schema through a series of tests to certify whether it satisfies a certain normal form. It proceeds in a top-down fashion and evaluates each relation against the criteria for normal forms, decomposing relations as necessary. The first three normal forms are based on the functional dependencies among the attributes of a relation. Attribute B is functionally dependent on attribute A if, at any moment in time, there is no more than one value of attribute B associated with a given value of attribute A. Normalisation of data can be looked upon as relational design by analysis the process of analysing the given relation schemas based on their functional dependencies and primary keys to achieve the desirable properties of: minimising redundancy improving data handling efficiency improving application development capabilities minimising the insert, delete and update anomalies presented by unnormalised relations: Insert anomaly - refers to a situation wherein one cannot insert a new tuple (row) into a relation because of an artificial dependency on another relation. Delete anomaly - the opposite of the Insert anomaly: refers to a situation wherein a deletion of data about one particular entry causes unintended loss of data that characterises another entity. Update anomaly - refers to a situation where an update of a single data value requires multiple tuples (rows) of data to be updated. Data modification becomes problematic. Database normalization refers to a systematic way of ensuring that a database structure is suitable for general-purpose querying and free of certain undesirable characteristics—insertion, update, and deletion anomalies – that could lead to a loss of data integrity. Unsatisfactory relation schemas that do not meet the normal form test conditions are decomposed into smaller relation schemas that meet the tests and hence possess the desirable properties. An idea is to write the SQL that reproduces the original views after completing the normalisation process. Full normalisation might introduce additional complexity in: Table and query design

1

Query optimisation Query processing At times, some degree of de-normalisation is required for performance purposes. Although most successful databases are normalised to some degree, there is one substantial drawback of a normalised database: reduced database performance. A normalised database requires much more CPU, memory, and I/O to process transactions and database queries than does a de-normalised database. A normalised database must locate the requested tables and then join the data from the tables to either get the requested information or to process the desired data. More JOINS are required thus: Increasing costs Decreasing performance There are costs to denormalisation, however. Data redundancy is increased in a denormalised database, which can improve performance but requires more extraneous efforts to keep track of related data. Application coding renders more complications, because the data has been spread across various tables and may be more difficult to locate. In addition, referential integrity is more of a chore; related data has been divided among a number of tables. There is a happy medium in both normalization and denormalisation, but both require a thorough knowledge of the actual data and the specific business requirements of the pertinent company.

Unique identifiers, Candidate keys, Prime attributes and Primary keys The first step in normalisation is to choose a primary key from among the unique identifiers we find in the relation. A unique identifier is an attribute or collection of attributes that uniquely identifies each occurrence of a relation. In a few cases, there is no reasonable set of attributes in a relation that can be used as a unique identifier. When this occurs, we must invent a unique identifier, often with values assigned sequentially or randomly as we add entity occurrences to the database. We call unique identifiers that have real world meaning (e.g. ID number) natural identifiers, and those that do not (invented ones) surrogate or artificial identifiers. Sometimes a relation will have more than one possible unique identifier. When this occurs, we call each possibility a candidate. If a relation schema has more than one key, each key is called a candidate key.

Normalisation

An attribute of relation schema R is called a prime attribute of R if it is a member of some candidate key of R. An attribute is called nonprime if it is not a prime attribute, that is, not a member of any candidate key. Once we have identified all the possible candidates for a relation, we must choose one of them to be the primary key for the relation. Each relation schema must have a primary key. We can use a preceding hash mark (#) to denote the attribute name(s) of the primary key. Choosing a primary key is essential to the normalisation process because all the normalisation rules reference the primary key. The criteria for choosing the primary key from among the candidates is as follows (in order of precedence): If there is only one candidate, choose it. Choose the candidate least likely to have its value change. (Surrogate keys are always less likely to change compared with natural keys.) Choose the simplest candidate - composed of the fewest attributes. Choose the shortest candidate - for efficiency. After arbitrarily designating one of the candidate keys to be the primary key, the others are called secondary keys.

3

1NF: First Normal Form relation Move repeating and multivalued attributes to a new relation. A relation is said to be in first normal form when it contains no multivalued attributes. The first normal form is now considered to be part of the formal definition of a relation in the basic (flat) relational model. It is defined to disallow multivalued attributes, composite attributes and their combinations. It states that: the domain of an attribute must include only atomic (simple, indivisible) values the value of any attribute in a tuple must be a single value from the domain of that attribute. We can denote repeating groups and multivalued attributes by enclosing them in pairs of parentheses. To transform unnormalised relations into first normal form, we must move multivalued attributes and repeating groups to new relations: Create a new relation with a meaningful name. Copy the primary key from the original relation to the new one. The data depended on this primary key in the original relation, so it must still depend on this key in the new relation. This copied primary key now becomes a foreign key to the original relation. Move the repeating group or multivalued attribute to the new relation. Make the primary key copied from the original relation unique by adding attributes from the repeating group to it or replace the primary key with a single surrogate key attribute.

Normalisation

2NF: Second Normal Form relation Move partially dependent attributes to a new relation. The Second Normal Form is based on the concept of full functional dependency. A functional dependency X®Y is a full functional dependency if removal of any attribute A from X means the dependency does not hold any more, that is, for any attribute AÎX, (X - {A}) does not functionally determine Y). A functional dependency X®Y is a partial dependency if removal of any attribute A from X does not determine Y, that is, if some attribute AÎX can be removed from X and the dependency still holds (or, for any attribute AÎX, (X - {A}) ®Y. If the primary key is composite, a nonprime attribute cannot depend on only part of the key for 2NF to be satisfied. A relation schema R is in the second normal form if every nonprime attribute A in R is fully functionally dependent on the primary key of R. (Remember, an attribute of relation schema R is called nonprime if it is not a member of any candidate key of R.) Put another way, a relation is said to be in the second normal form if it meets both the following criteria: The relation is in the first normal form. All nonprime attributes are functionally dependent on the entire primary key. The second normal form only applies to relations where we have concatenated primary keys (composed of multiple attributes). If we have a primary key composed of only a single attribute and the primary key is atomic (has no subparts that make sense by themselves, as all attributes should be), then it is simply not possible for anything to depend on part of the primary key. It follows then, that any first normal form relation that has only a single attribute for its primary key is automatically in second normal form. The test for 2NF involves testing for functional dependencies whose left-hand side attributes are part of the primary key. Once we find a second normal form violation, the solution is to move the attribute(s) that is (are) partially dependent to a new relation where it depends on the entire key instead of part of the key. A more general definition of 2NF will disallow partial dependencies not only on the primary key but on ANY candidate key of a relation. In this case, a general definition of prime attribute will come to mean an attribute that is part of any candidate key. Now, a relational schema R is in second normal

5

form if every nonprime attribute A is not partially dependent on any key of R.

Normalisation

3NF: Third Normal Form relation Move transitively dependent attributes to a new relation. The third normal form is based on the concept of transitive dependency. An attribute that depends on another attribute that is not the primary key of the relation is said to be transitively dependent. A functional dependency X®Y in a relation schema R is a transitive dependency if there is a set of attributes Z that is neither a candidate key nor a subset of any key of R, and both X® and Z®Y hold. A relation is said to be in third normal form if it meets both the following criteria: The relation is in second normal form. There is no transitive dependence - that is, all nonprime attributes depend only on the primary key. (Remember, an attribute of relation schema R is called nonprime if it is not a member of any candidate key of R.) If any attribute depends on a nonprime attribute (which depends on the primary key) then 3NF is not satisfied. To transform a second normal form relation into third normal form, we simply move any transitively dependent attributes to relations where they depend only on the primary key, whilst being careful to leave the attribute on which they depend in the relation as a foreign key. (This will be needed to reconstruct the original user view via a join.) A more general definition of 3NF will disallow transitive dependencies not only on the primary key but on ANY candidate key of a relation. In this case, a general definition of prime attribute will come to mean an attribute that is part of any candidate key. A general definition of 3NF will imply that the relation schema R is in third normal form if, whenever a nontrivial functional dependency X®A holds in R, either: 1. X is superkey of R 2. A is a prime attribute of R A relation schema R violates the general definition of 3NF if a functional dependency X® holds in R that violates both conditions of 3NF. Violating condition 1 means that X is not a superset of any key of R; hence X could be: o nonprime (we have a typical transitive dependency that violates

7

3NF) o a proper subset of a key of R (we have a partial dependency that violates both 2NF and 3NF) Violating condition 2 means that A is a nonprime attribute. A relation schema R is in 3NF if every nonprime attribute of R meets both of the following conditions: It is fully functionally dependent on every key of R. It is nontransitively dependent on every key of R.

Summary First Normal Form (1NF) A relational database is in First Normal Form (1NF) if each attribute is singlevalued with atomic values. Second Normal Form (2NF) To discuss Second Normal Form, we need to understand functional dependency. Attribute (or set of attributes) B is functionally dependent on attribute (or set of attributes) A, written A --> B, if for each value of A, there corresponds exactly one value of B. Attribute (or set of attributes) B is fully functionally dependent on attribute (or set of attributes) A, if B is functionally dependent on A but is not functionally dependent on any proper subset of A. The entirety of A must be specified for B to be determined. A relational database is in Second Normal Form (2NF) if it is in 1NF and each attribute that is not a primary key is fully functionally dependent on the entity's primary key; that is, non-primary key attributes are dependent on the entity's entire primary key. Third Normal Form (3NF) A relational database is in Third Normal Form (3NF) if it is in 2NF and no nonprimary key attribute is functionally dependent on another non-primary key. To normalize, we move both attributes into a new entity. Normal Form

Characteristics

Conversion Process

1NF

Single-valued attributes

Create new entity from multivalued attribute

Normalisation

2NF

3NF

Atomic-valued attributes

Break composite attribute into component attributes

1NF

Convert to 1NF

Non-primary key attributes fully functionally dependent on primary key

Attached non-primary key attribute to correct (perhaps new) entity

2NF

Convert to 2NF

No non-primary key attribute is functionally dependent on another non-primary key

Move both attributes into a new entity

Example The First Normal Form For a table to be in first normal form, data must be broken up into the smallest units possible. For example, the following table is not in first normal form. Name

Address

Phone

Sally Singer

123 Broadway New York, NY, 11234

(111) 2223345

Jason Jumper

456 Jolly Jumper St. Trenton NJ, 11547

(222) 3345566

To conform to first normal form, this table would require additional fields. The name field should be divided into first and last name and the address should be divided by street, city, state, and zip like this. ID

Firs t

Last

Street

City

56 4

Sall y

Singe r

123 Broadway

New York

56 5

Jaso n

Jump er

456 Jolly Jumper St.

Trenton

S tat e NY

Zip

Phone

1123 4

(111) 2223345

NJ

1154 7

(222) 3345566

In addition to breaking data up into the smallest meaningful values, tables in first normal form should not contain repetitions groups of fields such as in the following table.

9

Rep ID

Representat ive

Client 1

Time 1

Client 2

Time 2

Client 3

Time 3

TS89 RK56

Gilroy Gladstone Mary Mayhem

US Corp. Italian a

14 hrs

Taggar ts Linker s

26 hrs 2 hrs

Kilroy Inc.

9 hrs

67 hrs

The problem here is that each representative can have multiple clients not all will have three. Some may have less as is the case in the second record, tying up storage space in your database that is not being used, and some may have more, in which case there are not enough fields. The solution to this is to add a record for each new piece of information. Rep ID

Rep First Name

Rep Last Name

Client

Time With Client

TS-89

Gilroy

Gladstone

US Corp

14 hrs

TS-89

Gilroy

Gladstone

Taggarts

26 hrs

TS-89

Gilroy

Gladstone

Kilroy Inc.

9 hrs

RK-56

Mary

Mayhem

Italiana

67 hrs

RK-56

Mary

Mayhem

Linkers

2 hrs

Notice the splitting of the first and last name fields again. This table is now in first normal form. Note that by avoiding repeating groups of fields, we have created a new problem in that there are identical values in the primary key field, violating the rules of the primary key. In order to remedy this, we need to have some other way of identifying each record. This can be done with the creation of a new key called client ID.

Normalisation

Rep ID*

Rep First Name

Rep Last Name

Client ID*

Client

Time With Client

TS-89

Gilroy

Gladstone

978

US Corp

14 hrs

TS-89

Gilroy

Gladstone

665

26 hrs

TS-89

Gilroy

Gladstone

782

RK-56

Mary

Mayhem

221

Taggart s Kilroy Inc. Italiana

RK-56

Mary

Mayhem

982

Linkers

2 hrs

9 hrs 67 hrs

This new field can now be used in conjunction with the Rep ID field to create a multiple field primary key. This will prevent confusion if ever more than one Representative were to serve a single client.

The Second Normal Form The second normal form states that each field in a multiple field primary key table must be directly related to the entire primary key. Or in other words, each non-key field should be a fact about all the fields in the primary key. Only fields that are absolutely necessary should show up in our table, all other fields should reside in different tables. In our example, we are devoting three fields to the identification of the employee and two to the identification of the client. We could identify them with only one field each - the primary key. We can then take out the extraneous fields and put them in their own table: Rep ID* TS-89

Client ID* 978

Time With Client 14 hrs

Client ID* 978

Client Name

TS-89

665

26 hrs

665

Taggarts

TS-89

782

9 hrs

782

Kilroy Inc.

RK-56

221

67 hrs

221

Italiana

RK-56

982

2 hrs

982

Linkers

RK-56

665

4 hrs

This table contains time card information. Rep ID* TS-89

First Name Gilroy

Last Name

RK-56

Mary

Mayhem

US Corp

This table contains Client information

Gladstone

11

This table contains Rep information. These tables are now in 2NF. By splitting off the unnecessary information and putting it in its own tables, we have eliminated redundancy and put our first table in second normal form.

The Third Normal Form Third normal form is the same as second normal form except that it only refers to tables that have a single field as their primary key. In other words, each non-key field in the table should be a fact about the primary key. Any of the preceding tables acts as an example of third normal form since all the fields in each table are necessary to describe the primary key. Once all the tables in a database have been taken through the third normal form, we can begin to set up relationships.

Normal Is At Ion

Overview

More details

Related Documents

Normal Is At Ion

Normal Is At Ion Tutorial

Normal Is At Ion 1

Normal Is At Ion Des Procedures

Global Is At Ion

Organ Is At Ion