Database Designing Concepts Data Base A Data Base is shared collection of logical interrelated data. A Data Base System Every organization needs to record data relevant to their every day activates. The organization can choose from the data collected & stored some of these data in an electronic Data Base. Eg: A university needs to record data to help in the activates of teaching & learning therefore three record what kind of students & lectures they have what are the courses & modules they are running which lecturers are teaching what modules. One the data is entered in to a database it could be utilization to get complete & accrete information’s such as list of students who have enrolled for each modules. this can help to make decision on room utilization .etc.. Data vs. Information The Word data refers to facts concerning things such as people, objects or events. Information is data that have been processed & presented in a from suitable for human interpretation. Disadvantages of manual system 1. A Constant stream inter company paper work 2. Telephone calls or faxes are required to communicate changes & keep the file synchronize. 3. These systems cannot provide “what if” type of questions. 4. Managers cannot easily obtain summarized information required for decision making. 5. Duplicate data can exist throughout organization resulting in lack of consistency. Disadvantages of file processing system 1. There is limited data sharing 2. Inconsistency of data 3. Duplication of data can occur. 4. Excessive program maintenance 5. Poor enforcement of standard. Advantages of Database 1. Better data sharing 2. Better security & integrity 3. Reduced redundancy 4. The program & data are independent 5. Multiple views of data.
Disadvantages of Database 1. New specialized persons are required. 2. Need to get 3. Problems in data sharing explicit backup. Database Approaches Database approach emphasizes the integration & sharing of data. There are 2 designing approaches. 1-process Driven Design Requirement Analysis
Process Design
Data Design
Implementation File processing system a process driven approached was traditional being used with this approche organization process such as order processins, inventory control, pay roll are identified & analyzed. Process and data flows between processed are described using tools such as DED. Designers work backwards from the required outputs of the system to determine the requirement unput. They can use flow chart into outputs. Finally they design data file as a wide product of product of process divan design. 2- Data Driven Design Requirement Analysis
Data design Process Design
Implentation
It forcuses on entities person, Plese, events or concept about which an organization choose to record data. This approche ideenti files the attributes of the properties of these entities & the relation ship among them. It also identities the business rules that speifiese how the entities are managed or used. After creating the suitable module of the data structure and related business rules the designer developes the application reqired to manage the data. Designers now have discovered a balance approche to combine to different applications approprialely suitable. Database development Enterprise Data module
Requirement Analysis
Conceptual model
Logical Data modle
Implementation
Physical Data Model
Conceptual data model This involves building a “real world” model expressed interms of data requirements. The initiaset of information & processing reqirements are gathered from the users. The developments of the large databases conception modeline may consis of view integration (combining the user views into onces schema) The output from this stage is the entity relationship digram. (3) Logical data Model This involves the building of the real world expressed interms of a data model. Eg. Hierarchical, Network ,Relational. This involves determind the concept of the database using the conception model as input & transforming it into the architectural data model.This is the mapping of the ER model.to the relational model & caring the process of normalization. The output of this stage is relational schema up or set of table. Physical data model. This involves building a model of the real world expressed interms of data structures & Acces mechanisum available in a choosen Database management system.
This involved in a transformation of a logical data model suitable for a specific software /Hardware specification. This is usuall expressed in data dominion language & theoutput of this stage is implementation plan. Entity An entity is the thing we modeled & it is something about which we wish to store information we name the entities with the singular noun. An entity can be 1-Aperson – Employer, student, customer etc. 2-A place – It could be a state country , region etc.3-An object – chair,Table, building etc. 4- An Event – sales, Rigistration, Renewal. 5- A concept – Acount ,Course The organization. ER models are usually constructed during the analysis phase of the DB development process. Output of this stage is a coceptval data model expressed in the form of a detailed ER digram. The ER digram consist of entities relationships & attribules.
(4) Basic Systems of an ER digram.
Entity
Relationship
Attributes Registration Primary key Attributes. This is the property for entity.
Primary key
Primary key
Eg: Student Relationshipjhhgvf Rel Student No
Attributes Name Add Tel No
ationship Association between entities. Can be specified using relationship. There are 3 basic kind of relationship. 1- 1:1 (one – to – one) 2.- M (one – to – Many) / (1:*) 3 – M:M (Many – to – Many ) / (*:*) 1:1 1.
2.
(5)
Country
H
Capital
1
Lock
1 H
1 3
1 H
Husband
1:M/ / 1:*
Key
Wife
*
1 B
Invoice
1.
Customer
2.
Mother
1
* H ave
. 1
3 Shop
M: M /*:* PE . Student
Child
Sales Man
has
*
* Sales Man
Regis ter
Cannot allow M:M for introduced “Link Entity” with two 1:M relationship.
(6)
Student
Student OR
1 *** * Enrolment
Course
Course Student
Link Entity
Course
*
* Communi cate
Person
Country
Country C 1
*
Language
* 1 Person
(7) British counsil library Library items – Book / CD – DVD / Enayclopedia Reception Membership Readers Libraraian
British Councial Library
H ave
Reception
Regi ster
Reader
CD DVD Library Item
H ave
books
H ave
Encyclopedia Copy A
D eati
1.* Librararian
Lending Collection
Accomerdation of ER model to a relational Data model. After drawing a ER digram it has to be converted into relational schema it can be done as follows. Convert each entity into a relational table in your relational data model the identifier of the entity becomes primary key of the table, all other attributes become nonkey attributes of the table. (8) Eg. Student (Entity) (RegNo,Name,Add) Primary Key Table Name
1: M Relationships are represented as relational schema by taking the key field of the table at one end gide as a foreign key to the table at the many end side. Eg: Customer Customer (CNo Primary key, Name, Billno,goods-------) Foreign Key Order Order (Order No,Name---------(No)
1:1 relationship can represented as one table A turnary relationship is respresented as 1 table with the compersit primary key which consist of all & identifiers of the 3 tables. 1:M Recursive relationship is represented in1 with a forigenkey which is effectively primary key.
Manage
Employee
** Employee (EmpId , Name, Add.DoB ------
If there are M:M relationship they should be avoided by breaking them into 2, 1:M relationship for that we have to introduced additional entity call link entity. LE is added which cross references between instances of 1 entity & instances of other entity. The many ends of the relationship upper at the LE. Thhe PK of the 2 entities are in corperated into the linkentity. (9)
There can be two types of entities avatable in ER digram 1- Super type entity 2. Sub type entity Super type entity This is an entity that stores attributes commen to 1 or more entity sub types Sub type entities This is an entity that inherites some common attributes from an entity super type.
Attributes
These are the charatristics of entities can be used to disscribe entities and relationship in the ER digram. An entity can have many attributes but should have an identification key. A key is an attribute or group of attributes which can identify an entity uniquely. A key attribute can be underlined in the ER digram. There are single valued attribute is 1 that can have sigle valued attribute Is 1 that can have sigle value. Eg.A persons 1D ,Subject,Mark ,Grade, etc. Multy valued attributes they can have many values. Eg.An office can have different officeno.A person have sevelan qualification. Cardinaler & optionarity Cardinality we already discussed. Relationship partiapation Praticipation in an entity relationship can either optional or mandertary Optional relationship Here the participation is optional if one entity occurrence doesnot require the corresponding entity occurrence, Entity. Mandertary relationship Relationship participation is mandertary (must) if one entity occurrence in a pertionship indicate that the minimum cardinality is one mandertary entity (10)
Relationship degree The relationship degree indicates the no.of associated entities or types with the relationship.There are 3 of relationship 1- unary (Reairsive) This is the relationship between the same entity class or type. Binary Relationship This is a relationship between 2 different entity types.
Ternary Relationship This is a relationship between 3 entity type & it is a simaltion rishipamong 3 entity types at the same time. Find 5 types of unary , Binary, ternary relationship Unary
Person
Child
Talk
Individual
Cons ult
Turnary 1.
Company
Deal s
Customer
11 Note book
pen write
Play
moni tor
person
2.
Student
Text book Study
Teacher
(12) 3.
Company
Neeal
Staff
Customer
4.
Bank
Individual Need
Money
(13)
Binary 1.
2
Person
Individual
* Communicat
* eat
Language
Food
3.
*
Country Have
*
Company
4
Have 5,
*
Student
District
Staff
Subject
Study
Weak Entity Weak entity is one that needs e following conditions. 1- It cannot exist without e entity with witch have e relationship. 2- - It has a primary key that is partially or to fally dependent rom e parent entity relationship. Data models The evaluation of data model. Request for better data management has lot to do with several different models.That attempt to resolve & file system problems. The mager data models are 1- hierarchical model 2- - Network model 3- Relational model (14) Hierchocl model. This was e 1st database model interduced in mid 1960’s & was based on e hierarchical model. Which assumes that all data relationship can be structured as hierarchical. The hierchieal data model uses a simple approche where e relationship between entities are always 1:M, This forms a simple one parent record abcord above it & many child record below it. * Customer
* Invoice
Invoice line
This digram illustrate how can make a hierrchi showing e relationship between customer invoice & Showing e relationship between customer more invoice line A Cuctomer can own one or more invoice lines. In hierarchical database model these file would be tide together to e physical point. Al Pointer is a physical address which identify where a rewrd can be found on a disribed. Disadvantages. - Large amount of data is stored - There is only one path to access data item - Difficult in the real world. - Difficult to perform ad-hos quries Network Model The Network model was created to represent complex data relationship more effectively than e hierarchical model. In e network model any record main have many immediate parent records as well as many child records. The network model is able to model greater number of real world situation. The network model build around e concepts of “Set”. In a network Data base model a relationship is called set (There are 1:M relationship between e owner & e member) Each set is composed of atleaset 2 record type whwre e owner = hierarchical parent record & e member record = The hierarchical child record. The differents between e night inelude a than one set. In other words a member can have several owners. (15)
Supplier m 1 Ubrary
m Staff 1
m Library item
m m
1 Member
1 1
m Fine
m Reterence
Disadvantages - When compare to hierarchical structure the network strueture is more complicated by the set concepts - The hos quries are difficult to execute - The programmer should know the structure as well as set type which makes of the networking before processing is done. Relational Model Relational model user the table from of data collection, which is similler to hles. Tables within the relational model are known as relations. The records in the table are known as “tuples” in a relational model a relational or table consist of a series of rows the columns. A column is known as field where a row is to tuple STUDENT Field= Couumns Primary key
Reg No
St Name
St Add
Tel
00106
Krishi
Col-06
2361742 Rg.No
Record/ Tuple/ Row Stude Course code HND
Payment Amt
Reg No
10,000
00106
Forein key Relations can be related to each other by sharing common entity caracteristies. The common link between student & payment tables enable us to connect the student to the payment table. Even though their details are stored in seperrate tables this is done by the use of payment key – foreign key concept. This relational model contains 3 relation type normally 1-1:1 2-1:M 3- M:M A relational schema is a visul representation of the relational entities if attritutes of those entity & relationship. Advantages of relational model.
Paym
-
Essy to understand the concept users are able to interaet with the relational DB very easily since the familier to everyone. No physical pointers when compared to previous model. Easy setup & change can be done with minimal effort. Logical & physical independent.
Relational Data Base Model Keys 1- Primary Key- A Primary key is a attribute that can uniquely inentity given row. 2- Eg.In student table primary key can be taken as “Reg No” 2.- Composit key – Aprimary key can a single attribute or commpersision of several attributes. Which can be used to uniquely identify a row or tuple. A unique identifier cabn be made up of mony attributes & that is call a composite primarykey. Compsit key STUDENT
Eg: Reg No, Course Code
Course code Reg No 0001 0001 0001
HND Multi C++
3- Foreign key – An attribute (or combination of attributes) in one table whose values match the primary key in another table is called as foreign key. Foreign key is an attribute whose values match primary key values in the related table. STUDENT Reg No
PAYMENT RegNo**
4- Candidate key – We can identify a candidate key an attribute or combination of attributes that cmiquely identify a new in a table. A relation can have more than one candidate key among, among the one will be choosen as the primary key. Eg. Person ID
Project ID
Supervisor ID
Time spent
0001 0001
01 02
0002 0002
50 hrs
Candidate key – 1 – Person ID, Project ID 2- Person ID, supervisor ID Reg no
Name
Dob
001 002 003
Krishi Krishi Renu
22.10.96 19.11.96 22.10.96
Candidate key- 1- Reg No, DOB 2- Name , Reg No Relational Data Base Integrity Rules. 1. Entity Tntegrity Specially all primary key entries are unique & no part of a primary key may be null. This is to gurante that each entity will have a unique identity and ensure that foreign key values can be properly identified . Eg: No invoice can have a duplicate number as well as it cannot be null. All invoices uniquely identified by their invoice number. 11. Referential Integrity A foregn key may have either a null value as long as it is not part of its tables primery key or an entry that matches a primary key value in a table to Which it is related. This is to make. It possible for a attributes not to have a corresponding value but it should be impossible to have invalid entry . Eg: 1 M Teacher Student Teach TEACHER * Techer ID
TName
Qualification
Course
001 002
Mary Ann
Bsc Msc
Dip.Ict HND
STUDENT StReg No
Name
DOB
Course
Teacher ID
PG 0001 PG 0002
Jack Tom
25.02.80 08.11.86
HND Dip.ICT
001
1 1-
*
Employee
Superviser
Supervision
Supervisor
SupID
SName
Add
Department
Tel
001 002
Amal Kamal
Col-3 Col-4
1A 2A
256742 232786
EmployeeNo
EName
Department
SupID
001 002
Rohan Shovon
1A 2A
Null 002
We don’t need details about supervisor ID in a Employee table. 2.
Bank Account
Customer
Have
Bank Account
3-
Bank Type
Intrest Rate
Saving Current
0.05 0.15
Movie
Star Movie
MID 123 456
MName Star Moon
Language Tamil Tamil
Date 1/1/2000 22/2/2000
Star StarID
SName
Salary
MName
MID
001 002
Ananda Nalanda
10,000 150,00
Star Moon
123 456
We Need to identify about the movie ID in a star table iii.- Domain Integrity This concerns the values which are stored in a particular columns of a table. Relational Database operators (Relational Algebra) manupilahng tables and its contents using 8 relational operators. 1- Union 2- Intersect 3- Difference
4- Product 5- Select 6- Project 7- Join 8- Divide Union This combines all the rows from 2 tables. The tables must have the some attributes to be used in the union. Stno
1111 3333 5555
Name
Add
Tel
AA BB CC
Col-3 Col-5 Col-7
234567 543211 2113441
+
Stno
Name
Add
Tell
2222 4444 6666
DD EE FF
Col-4 Col-6 Col-7
2234415 5432167 5762134
StNo
Name
Add
Tel
1111 3333 5555 2222 4444 6666
AA BB CC DD EE FF
Col-3 Col-5 Col-7 Col-4 Col-6 Col-7
234567 543211 21`13441 2234415 5432167 5762134
Intersect Intersct output only the rows apperar in both tables. In this case the tables must be union compatible to produce a valid result. You cannot use intersect if one of the ottributes is numeric & the other is character base. Difference Production Soap Noodles Oil
Production Soap Morgarine Bread
Product Noodles oil Margarine Bread
Product Product outputs all possible pairs of rows from 2 tables. If one table has four rows & the other table has 3 rows a product output is a list of components of 4x3=12 rows. Itcode
Itdes
ItQty X
001 002
AA BB
10 20
ItPrice
Itrecord
OrdNo
100 50
2 5
111 222
Itcode
Itdes
Itprice
Itrecord
Ord No
001 001 002 002
AA AA BB BB
10 10 20 20
100 50 100 50
111 222 111 222
Select Select output values for nall rowsfound in table. Select can be used to list either all of the row values or it can output only. Those row values that match a specified criteria, In other words select outputs a horizontal substraction of a table.
Name
Item ID
Descriptin
Qty
0001 0002 003
AA BB CC
Shoe Noodles Bread
20 30 400
Output will be same as Item Table
Eg: 1- Select * from Item Eg: 2- Select * from Item where ItemID =”0001” ItemID 0001
Name AA
Description Qty Shoe 20
Project Project outputs all values for selected fields. In other words project outputs a vertical substraction of a table. Item 1 ItemID Name Description Qty 0001 AA Shoe 20 0002 BB Noodle 30 0003 CC Bread 40 Item 1 Item ID 0001 0002 0003
Name AA BB CC
Description Shoe Noodle Bread
Qty 20 30 40
Eg: 2- Project ItemID, Name from Item Item ID 0001 0002 0003
Item 2 Name AA BB CC
Divide Divide requires the use of one single column. Table 1 two column table A
B
ITcode
Description
0001 0002 0003 0004
AA BB CC DD
IT ode 0001 0003 0007 0008
Divide A/B
Description AA CC
Eg:2 Code A A B B B
Loc No 5 7 5 6 3
Q Table-1 Reg No 0001 0002 0003 0004
Name AA BB CC DD
Divide
Code A B
Loc No 5
Table - 2 Add Col-3 Col-4 Col-5 Col-6
Tel No 1111 2222 3333 4444
Reg No 0005 0006 0007 0008 0003
Name EF GH HI JK CC
Add Nuge Maha Kirula Katta Col-5
1- Union Reg No 0001 0002 0003 0004 0005 0006 0007 0008 Reg No 0003
Name AA BB CC DD EF EH HI JK 2- Intersect Name Cc 3- Product
Add Col-3 Col-4 Col-5 Col-6 Nuge Maha Kirula Kotta Add Col-5
Tel 3333
Tel No 1111 2222 3333 4444 5555 6666 7777 8888
Tel No 5555 6666 7777 8888 3333
Course Name HND NCC BIT ICT HND
Reg No
Reg No
Name
Add
TelNo
Name
Add
Tel No
0001 0001 0001 0001 0001 0002 0002 0002 0002 0002 0003 0003 0003 0003 0003 0004 0004 0004 0004 0004
AA AA AA AA AA BB BB BB BB BB CC CC CC CC CC DD DD DD DD DD
1111 1111 1111 1111 1111 2222 2222 2222 2222 2222 3333 3333 3333 3333 3333 4444 4444 4444 4444 4444
00005 00006 00007 00008 0003 0005 0006 0007 0008 0003 0005 0006 0007 0008 0003 0005 0006 0007 0008 0003
EF GH HI JK CC EF GH HI JK CC EF GH HI JK CE EF GH HI JK CC
Nuge Maha Kirula Kotta Col-5 Nuge Maha Kirula Katta Col-5 Nuge Maha Kirula Katta Col-5 Nuge Maha Kirula Katta Col-5
5555 6666 7777 8888 3333 5555 6666 7777 8888 3333 5555 6666 7777 8888 3333 5555 6666 7777 8888 3333
4-Difference Reg No 0005 0006 0007 0008
Name EF GH HI JK
Add Nuge Maha Kirula Katta
5-Select Name from Table 2 Name EF GH HI JK
Tel No 5555 6666 7777 8888
Course Name HND NCC BIT ICT HND HND NCC BIT ICT HND HND NCC BIT ICT HND HND NCC BIT ICT HND
6-Select * (All) From Table 2 Where Add ≠ Col-5 Reg.No 0005 0006 0007 0008
Name EF GH HI JK
Add Nuge Maha Kirula Katta
Tel No 5555 6666 7777 8888
7- Project Table over Reg No, Name Reg No 0001 0002 0003 0004
Name AA BB CC DD
Join Join allows us to combine information from 2 or more tables. Join has power for relational Databases alliwing to use of independent tabales to linked by the common attributes. Join is a result of 3 stages. Stage 1- The product operation is applied stage 2- select operation is applied stage 3- A Prosect operation is applied. CUSTOMER Code CName 0001 A 0002 B 0003 C 0005 D
AGENT CAdd Col-3 Col-4 Col-5 Col-6
ACode 2002 1001 2002 3003
A Code 1001 2002 3003
AArea Kirula Moha Kohu
ATel 2222 3333 4444
Product C code 0001 0001 0001 0002
Cname A A A B
CAdd Col-3 Col-3 Col-3 Col-4
Acode 2002 2002 2002 1001
Acode 1001 2002 3003 1001
Aarea Kirula Maha Kohu Kirula
ATel 2222 3333 4444 2222
0002 0002 0003 0003 0003 0004 0004 0004
B B C C C D D D
Col-4 Col-4 Col-5 Col-5 Col-5 Col-6 Col-6 Col-6
1001 1001 2002 2002 2002 3003 3003 3003
2002 3003 1001 2002 3003 1001 2002 3003
Moha Kohu Kirula Maha Kohu Kirula Maha Kohu
3333 4444 2222 3333 4444 2222 3333 4444
Per form the select condition where customer Acode = Agent Acod TABVLE – 1 Ccode 0001 0002 0003 0004
Cname A B C D
CAdd Col-3 Col-4 Col-5 Col-6
Project Table-1 over Acode Ccode CName 0001 A 2222 B 3333 C 4444 D Q. SNo 1111 2222 3333 4444
Product Sno 1111 1111 1111
SName AA BB CC DD
Sname AA AA AA
Acode 2002 1001 2002 3003
CAdd Col-3 Col-4 Col-5 Col-6
Course HND BIT VOW Multi Model
Course HND HND HND
Acode 2002 1001 2002 3003
ACode 2002 1001 2002 3003
CAdd L11 L12 L11 L12
Lecid L11 L11 L11
Area Maha Kirula Maha Kohu
A Add Maha Kirula Maha Kohu
ACode L11 L12 L13 L15
Lecid L11 L12 L13
Atel 3333 2222 3333 4444
A Tel 3333 2222 3333 4444
AAdd AB BC DE FG
ATel A B C D
Lecname AB BC DE
LecQual A B C
1111 2222 2222 2222 2222 3333 3333 3333 3333 4444 4444 4444 4444
AA BB BB BB BB CC CC CC CC DD DD DD DD
HND BIT BIT BIT BIT VOW VOW VOW VOW Multi Mid Multi Mid Multi Mid Multi Mid
L11 L12 L12 L12 L12 L11 L11 L11 L11 L12 L12 L12 L12
L15 L11 L12 L13 L15 L11 L12 L13 L15 L11 L12 L13 L15
FG AB BC DE FG AB BC DE FG AB BC DE FG
D A B C D A B C D A B C D
LECID L11 L12 L11 L12
LECNAME AB BC AB BC
Select Table .LECID = TABLE 2. LECID Table -3 SNO 1111 2222 3333 4444
SNAME AA BB CC DD
COURSE HND BIT VOW Multi Mid
Project Table 3 over LECID Sno Sname 1111 AA 2222 BB 3333 CC 4444 DD
CECID L11 L12 L11 L12
Course HND BIT Vow Multi mid
Lecid L11 L12 L11 L12
Lecname AB BC AB BC
LECQUAC A B A B
LecQual A B A B
Relational Analysis There are 2 ways of describing relational model. One method uses the tearms relation, attributes & tuples. The other method describes the model using more familier tearms that is table, field, record/ row. The relational model describes the data as a set of relations & table. Each relational table has a table name with set a attributes & each attribute having a unique name. The relation attributes are same as table columns or fields. Each relation has set of tuples/ records in a table.
There is a close crospondence between the ER model & Relational model. ER View Employee
Project
*
Employee
Project Assignment
Project
EmpNo
Employee Empno 1111 2222 3333 4444
Name AA BB CC DD
ProNo
Add Col-3 Col-4 Col-5 Col-6
Tel 1234 1235 1236 1237
Project Prono 001 002 003 004 Emp no
Protitle
Duration 6 month 12 Month 8 Month 6 Month Pro no
In relational analysis the notation we would use is as follows. Employee (Empno) Name, Add, Tel No) Project ( Prono, Pro title, Duration, Budget) Project Assignment (prono) ,Empno, Time spent)
Budget 20,000/= 80,000/= 35,000/= 25,000/= Time spent
In this notation each relation is represented by one line. Each relation is represented by one line. Each line starts with the relational name & it is followed by the names of the relational attributes with in the brackets. The under- lined lined attribute 9s) is the relations,s keys (primary keeys) If you can organize data into set of such relation then we can gurantee a good DB design. A Number of normal forms have been defined to eliminate duplications in relations. They are commonly known as - 1st Normal Form (INF) - 2nd Normal Form (2NF) - 3rd Normal Form (3NF) - Boyce code Normal Form (BCNF) Normalization What is Normalization is a formal process for designg which attributes should be grouped together in a relation. Before proceding with the physical design we need a method to validate the logical design to this point. Normalization is a tool to validate & improve a logical design. It needs to satisfy some constrain. Such avoide unnecessary duplicateting data normalizations is a process of decomposing relations with anomalies to produce small well structured relations. Normal form are the ruled for structuring relations these were indroduced by E.F.Cood. Later Dr.Cood proposed another normal form call boyce could normal form (BCNF) Reason for normalization data structure to avoid data redundancy – If a creting fields occurs in DB several times errors can are 3 types of anomalies exist 1- Insertion Anomaly 2- Deletion Anomaly 3- Updation Anomaly Insertion Anomaly Employee Empno E100 E150 E300 E350
Name Smith James Ann David
Add Col-1 Col-2 Col-4 Kandy
-------
Empno C-ID
CName Dura
Price
E100 E100 E300 E100
DICS IAD BSC MCSE
65000 10900 12500 15000
D-100 D-1100 D-25 M-100
5m 1yr 7yr 5yr
Supose that we need to add new course details to EMP_ course table where the PK is EMPNo, C-ID. To insert a new row the user must supply the values for both empNo & C_ID. This is an anomaly since the user should be able toenter the course details without supplying the EMPNo. Deletion Anomaly Suporse that the data for EMPNO E100 is deleted from the table this will result in loosing the information to the course name,IAD ,DICS,MCSE To voide these anomalies we can apply normalization process for the data. Unnrmalization form
Tables with repeating groups Remove repeation 1st normal form Remove partial dependencies 2nd Normal form Remove transitive dependencies 3rd Normal form Remove remaining anomalies
Boyce code
Determinency & Dependency During normalization logical assouations between data items are identified & represented in the DB Design without any file maintenance anomalise. Identitifycation of logical associations between data items is important in normalization & DB design. Such associations between data items are calle determinency & dependency relationship . Eg: If we can difermine a specific valie of data item “V” If we know the data item determinant & Data item V set to be the dependent. Therw are 2 types of deteminency dependency relatitionship 1- Functional dependenty 2- Non functional dependency Functional dependency Normalization is based on the analysis of functional dependency, A functional dependency is a relationship between 2 attributes Eg: - A relation “R” has an attribute A for every value instance of A this is value of a uniquely determines the value of B for a give. Non Funtional dependency For a given data item if we can find several value of other data item then there is a non functional dependency between 2 data items Eg::- A student No can have several subjects related. Candidate key This is a determinant that can uniquy identified non key attribute are functionally dependent on part of the primary key. Eg:- Emp Course (EmpID,Name CName, CDuration) Course name & Duration depends on parts of the primary key that is empid the other part will be linked by CID. Moltivalued dependency This is a type of dependence of exist where there are atleast 3 attributes in a relation Eg: A,B,C,Relation
Each value of a there is a well defined value for B & well defined value for C But the set of values of B indepennent of C Eg:
Subject DBD DBD
Teacher AA BB
Tex Book Text 1 Text 1
Transitive dependency This is funtionnal dependency between 2 or more “non key” “Attributes in a relation. Eg: Consider thye following relation Eg: Consider the following relation Sales (CNo,Name, Sales person, region) In the above relation each sales person is asigened to a unique region We can identify that functional dependence exist in tne sales relation i- Customer No
2- Sales person
Name Sales person Region region
The region is functionally dependent on sales person & sales person is functionally depend on CNo, AS a result there are update anoma lies in the sales relation. Eg: We can,t add new sales person to a new region without a customer. If we delete a customer we,ll loose the sales person information aswell. 3- If a salesperson is transferred to a new region then the several rows must be changed. Rules of Data Normalization. 1- INF- Elemunate repeating groupsMake a Elemunate repreating groupsMake a separate table for each set of related attributes & give each table a primary key. 2- 3NF- Eleminate columns not depend on key- If attribute do not contribute to the description of key remove that to a separate table. 3- BCNF- This is Boyce codd Normal dependency between candidate key attribute sdperte them to different table.
Eliminate repesting Groups. In the original member list,each member name is followed by any database that the mwmber has experience with . some might know many, & others might not know any. To answer the
question, who knows DB2? “we need to perform an awkward DB2?” we need to perform an awkward scan of the list looking for reerence to BB2.This is inefficient & extremely. Untidy way to store information. Moving the known DB into a separate table helps a lot. Sepatating the repeating groups of DBS from the member information results in INF. The member ID in the Database table matches The PK in the 2 tables with a join operation. Now we can answer the question by looking in the database table for “DB2” & getting the list of member.
DataBase Table
Member list 1. Jojn SmithAccess DB2 FoxPro 2. Dave jones dBase,Clipper 3. Mike Beach 4. Jerry Miller DB2, Oracle 5. Benstuary Oracle, Sybase 6. Fred Flint Informix 7. Joe Blow 8. Greg brown Access , Missq Server 9. Doughope
2. Eliminate Redundant Data
Member table MID Name 1. Johnsmith 2. David Jones 3Mile Beach 4.Jerry Miller 5. Benstuart 6. Fred Flint 7. Joe Blow 8. Gre Brows 9. Dougltope
DID MID Database 1 1 Access 2 1 DB2 3 1 Foxpro 4 2 dBase 5 2 clipper 6 4 DB2 7 4 Oracle 8 5 Oracle 9 5 Sybase 10 6 Infomix 11 8 Access 12 8 Mssqlserver
3. In the DB Table, & PK is made up of the member ID & Databade ID. This makes scnse for other attributes like “ where learned” & “Skill level” attributes, since they will be different for every member/ database combination. But the database name depends only on the Database. The same database under different IDs. This is an update anomaly. Or suppose the last member listing a particular database leaves the group. This records will be removed from the system, & the data base will not be stored anywhere . This is a delete anomaly. To avoid these proplems, we need 2NF. To Achieve this ,separate the attributes deperching on both parts of the key from those depending only on the Database ID. This results in 2 tables: “ database” Which gives the name for each database ID , and “Member database” which list the databases for each member. Now we can recassfica database in a single operation: look up The database ID in the ‘ Database’ table & change its name. The result will instantly be available throught the application. Database Table DID 1 2 3 4 5 6 7 8 9 10 11 12
MID Database 1 Access 1 DB2 1 Foxpro 2 dbase 2 Clipper 4 DB2 4 Oracle 5 Oracle 5 Sybase 6 Informax 8 Access 8 Mssqlserren
Member Table MID Name 1 2 3 4 5 6 7 8 9
Database Table DID Database 1 Access 2 DB2 3 FoxPro 5 dBase 6 Clipper 7 8 9
Member Database Table MID DID 1 1 1 2 1 3 2 4 2 5 4 2 4 6 5 6 5 7 6 8 8 1 8 9
3 Elimnate columns not Dependent on key The member table satisfies INF – It contains no repeating groups. It satisfies NF – since it does’t have a multivalued key. But the key is member ID,& the company name & location describe only a company, not a member. To achieve 3 NF, They must be moved in to a separate table. Since they describe a company, company code becomes the key for the new “company” table. The motivation for this is the same for 1NE : we want to avoid update & the IBM were currently stored no members from the previous design, there would be no record to its existence, even though 20 past members were from IBM:
Member Table MID Name Company 1 Johnsmrth ABC 2 Dave Jones MC1 3 Mike Beach IBM 4 Jerry Miller MCI 5 Ben Stuart AIC 6 Fred Flint ABC 7 Joe Blow RuNuts 8 Gerg Brown XYZ 9 Doug Hope IBM
Member Table MID Name 1 2 3 4 5 6 7 8 9
CID 1 2 3 2 4 1 5 6 3
Com Loc Alabama Florida Delaware Flnda Nebraska Alahama Iowa Now York Delawre
Company table CID Name Loc 1 2 3 4 5 6
BCNF Boyce – code Normal 1 form states mathematically that A relation R is said to be in BCNF if whenever X->A holds in R ,and A is not in X, then X is candidate key for R. BCNF covers very specific situations where 3 NF misses inter- dependencies between nonkey (but candidate key) attributes. Typically, any relation that is in 3 NF is also in BCNF if (a) there are multiple candidate keys, (b) the keys are composed of multiple attributes & (c) there are common attributes between the keys. Eg: Classcode 503 540
Class Enrolment Class Discription Mgt Info Maths
Student No 0001 0003 0005 0002 0004
Name AA BB CC DD EE
1- Convert into INF 2- Repealing group exist within this data each class code have any no. of student init, so the student information makes the repeating groups. Data cannot be stored or processed in a database when it this from. What we musthave is one record containing all the data for each student. INF Class code 503 503 503 540 540
Class discription Mgt Info Mgt Info Mgt Info Maths Maths
Student No 0001 0003 0005 0002 0004
Name AA BB CC DD EE
Enrolled in a class, there can be no “gaps” in the data when stored in a file The above table is in the INF no repeating data & no gaps. To take the table to the 2NF we would have to remove partial dependency. Class Information Student Info
Class code 503 540
Class Discription Mgt Info Maths
Class code 503 503 503 540 540
Student No 0001 0003 0005 0002 0004
Name AA BB CC CC EE
Non loss decomposition This is the process of transforming an unnormalized data set in to a fully normalized database. Relational/ Bracketing nitation If we are going to represent all normalized table within the system it’ll become very complexed, to avoid this , we can use the bracketingnotalion which can represent table by on statement. It’ll start with a table name followed by the key (s) attributes are separated with commas as well as primary keys are underlined & the foreign key shown with** Emergence of the DBMS Enter the database management system the simple approohe open data file for input soon became fraught to be addressed. Computer system vendors needed to be able to support the critical needd of a growing & evolving market palce that supported the data processing needs of organization in all fields of human endevaour. Thhe first DBMS appeared during the 1960’s at a time in human history where projects of momentous scale were being contemplated planned engineeed never before had such large data problems on the flow where identified & solutions were researched & develop ofen in real time The BBMS became necessare because the data was for more volatile than had earlier been planed & because there was still major limiting factors in the cost associated with data storage media. Data grew as a detailed transations by transation levels In the 1980’s all the major vendors of hardware system large enough to support the e volving need of evolving computerized . record keeping systems of large oranizations bundle some from of DBMS with their system solution. The 1st DBMS species were thus very much vendor specific. IBM as usual let the field but there were a growing no of competes and cloneds whose DB solution offered varing entry points in to the bandwagon of computerized record keeping system. Through this time the specifc nature of the problems being resolved & were around from the perspective of IT management were evolving with the teahnology & the no of main frame & mini computer hardware vendors in creased & the no of pheriperal types & their vendors also
increased. The bundeling of DB operational sevices such as the ability to perform & Schedule data backup became routine. The IT operational environment at this time was categorized with a collection of system house keeping task with major emhersis on organization produetion DB backup. File recoranization & reindex ation of data were also standard but mainly manual system house keeping system task. Other tasks incude storage media arehival management of document & records or business perticilarly those relste to finance record keeping. MedPix medical image database uses helthy dos of foss MedPix is a sprawling online medicimages DB & diagnostic tool that’s used around The wirld by radiologist, nurses, physicians & medical students & the whole system is powered by linux & open source software. MedPix is hosted by the US federal government’s helth sciences university, the uniformed services university in Bethesda, Maryland. It ‘ s the brainchild of JamesG. Smirniotopoulos, M.D., a USU professor of Radiology, Neurology & Biomedical informatics & clinical sciences chair of it’s Department of Radiology & Radiological sciences. Exploiting the DBMS (Selecting a DBMS) The DSMS is a software application system that used to create, maintain & provide control to create, maintain & provide control to user database Eg: It provides a user interface to the data base system. Data management layer The ANS1 / SPARC architechture A DBMS works as an interface between application progras of the users & the data base. The American National standard institute standard planning & requirements comitte propose a 3 level architecture for this interface such as. External/ User view This level describes The user application programa viwew of the database several users can share the same view. Conceptual/ l0gical view This describes overall data requirementd as a detailed technology ideperdent speafication of the over all structure of a database.Entity- Relatiorship modeling & object – Oriented modeling are 2 diffw\erent graphical notations used for presenting the comceptual notation used for presenting the conceptual mobbed. Physical / Internal view This describes in which the data is stored & data is accessed
The above 3 layer of architecture create independence of data at 2 level. Logical Data independence This refers to the ability of changes to the external Without Eg:- Phys any influence to the external cchema. Physical Data independence This refers to the ability of introduang changes to the physical sohema without any influence to the logical schema Eg: Chaging the storage structure of data without any influence to the wonceptial Schema.
Toolkit Interface
The DBMS Consist of 3 parts. 1. kernel (core- central point0 2. Interface 3. toolket Kernel. The DBMS kernel is the central ergine which handles the rain data management functions. Most DBMS are installed on thetop of some operating systems. Therefore DBMS needs to interact with operating systems to implement & access the data & system calalogece which are usually stored on harddisk. Therefore they interact with some of the operting system elements such as i.- file manager which translate between the data structures manipulated by the DBMS & the files on the harddisk. ii. acces mechanism – the file manager doesnal manage the physical input & output of data directly . It interact with appropriate access mechnisum established for different physical structures.
System buffer – The reading & writing data is normaly stored in the system buffer of the operating system. Functions of DBMS. - the DBMS must be able to create, retrieve, update, delete the data from the data base (CRUD Function) -
Data Integrity & data seairity
-
Data Communication
-
Conarrency cCntrol
-
Data recovery & backups
-
Query optimization
-
Managing transaction
-
Scheduline transaction
-
Data Dictionary
Software components available in DBMS. -
DDL Compilers DDL Compiler
Data dictioraiyment
Query processor & runtime processor User Queries
Query
Runtime Processor
Pre- compiler Application programs
Precompiler
DML Compiler
Host lang
DML Compiler Host Language Compiler Data dictionary manager Data base features Data independence The srperation of data descriptions/ definitions from the applications that uses the data In a relational model the logical independence & physical independence is achieved by allowing changes to the physical storage or the application programs to take place without the user being aware of the changes. Data abstraction Data abstraction is the process by which a data base attempts to represent properties of objects in the real world. A database records. The relevant details required to support some organizational activities & it does not record all the details. The database is an abstraction of the real world. The abstraction level varies according to the different commercial databases. Data Security Data security is the process of protecting the data from external thrects. Data is key resource for organization. Loss of data , Privacy , integrity, availability etc are issues that cost an organization in financial terms. Data Interation
A data base should be a collection of data which at last has no redundant data. The aim is to hare 1 database storing 1 logical item of data in one place which can be accessed by a range of information systems. Eg: Replacing several departmental data base in one company with one data base system that can be nade accessible to several departments. Physical Data base design. The process of physical database design is to translate the logical description of data in to the technical specification for retieving & storing data, the goal is to create the required performance & ensure the database integrity, security & recoverability. Physical database design must be performed carefully since the decisions made during this stage have major impact on data accessibility , response time , security , user friendliness , ect. Physical database design doses not include implementing database but it produice a technical specification that programmers & others involved in information system construction There are 3 major inputs to physical database design Logical database structure that were developed during logical design. This structure may be expressed in itierarachical, Network, relational object oriented & definition for each attribute. User processing requirements that were identified during requirement definition including sign , when data is used & where response time , time , data ,security , backup & recovery , integrity & retention of data. Description of technology used for omplemnting the databases (DBMS0 There are several critical decisions that ‘ll affect the integrity & performance of the applications system. These key decisions are - data volume analysis - transaction / usage analysis - Contro & security analiysis - Destribution analysis - Integrity analysis
Integrity Analysis Tntegrity defines the business rules or constraints that should apply on the data. There are 4 basic types of integrity rules. Entity Integrity This define that each and every entity should have a unique identifier a primary key and it cannot be null.
Referential Tntegrity This refers to the rules concerining the relationship between the 2 entity types (1e) by defining foregn key relational data model. Domain integrity This refers to the eonstraction valid values for attributes. We can achieve the domain integrity for an attribute by specifying the following. - The type of the attribute - The length - Format range - Allowed values - Null support or Not - Check constraints Triggerring Operations All the other business rules that protect the validity of an attribute value defined as a tribber A trigger is a program / proceduire which is aulmatically executed due to an event. An event can be either Insevetion , deletion / updation. User Rule: check with drawal amount
Setn up integrity constant expressed in some data definition language and set of a additional interity constraints expressed in some data interity language A set of distribution strategies A set of defined users A plan for securing data Database Implementation decision 5 options are available to the relationan DB developers in fine tuning the application to increase the performance of the application. Establishing the storage structure the associated mechanisms Adding Indexes Denormalization Exploiting the DBMS Implementing integrity constraints Establishing the storage structure & Access mechanisms. The criteria that are normally important in selecting the storage structure are Fast access for retrieval High throughput for processing transcaction Effective use of storage space Protection from failures Minimizing the need for reorganization Accommodating a growth Securing from unaithorised MSE. The designeres must select the file organization tha provides areasonable balancing among the criteria withing the resource available Mechanisms available for storing data in a relational system are Sequeritial files Hash files Clustering files Indexed sequential files. Adding Index Index is a table or other data structure that is used to determine the location of the row in a table that satisfies the condition. Indeex may be defined on both primary key value or non key attribute value.(Secondary Index) Indexing improves the access performance but if reduces the update performance because every time we add a record to a table, each index on that table has to be update, we always index on primary key & foreign key but sometimes we index on other nonkey attributes of a table when they are extremely important in the running of some reports.
De – Normarisation The main problem with fully normalized database is it usually made up of many tables. To perform queries , such tables have to be reconstructed using join operations. There fore some database developers occasionally choose to implement a relational database that purposely violation the principles of normalization. Their goal is to reduce the no.of physical database tables that must be accessed to retrieve the desired data by needd. But denormalization leade & it increases the change of errors and inconsistercies. Situations in which you should consider to denormalise are 2 Entities with a 1-1 relationship. If one of the entity’s is an optional participant. If the matching entity exist most of the time, then it may be advisable to combine the 2 entities in to one table. Many – many relationshop with a non key attribute – rather than combining 3 tabless to extract the data from the 2 entities in a relationship , if may be advisable to combine the columns from 1 of the entities in to the tables representing the M-M relationship which avoids 1 joins in many queries. Reference Data – If exists in an entity on the 1 side of 1 – M relationship & this entity participates in no other data relationship , then wer can consider of mering these two entities. Integrity constraints There are 3 main ways of implementing constraints. Inherently – We should at leaset be able to specify entity , referentoral & domain integrity. Procedural – Constraints occus outside the data model. Most existing data model. Most existing application programs that interface with database implements the constraints procedurally. Nonm – procedural – constructing the constraints by implementing triggers & stored procedures. Transaction management & concurrency control what is transaction. Transaction management & concurrency control what is transaction. In database term a transaction is any action that read from db or write on db. Transaction may consist of a simple select statement to gwnerate list of table contains. If may consist of a serious of insert statements to add rows to , or moretables or if may consist of combination of select , update , insert statements. Transaction management with SQL Amaerican national standard Institude (ANSI) has defined standards that Govern SQL DB tranactions. The transaction support is provide by the SQL statement commit & roll back.
Commit A commit statement is reached in which case all changes are permanently recored with the DB> The commit statement automatically ends the SQL statement. Roll back A roll back statement is reached in which case all the changes are dborted & DB is rolled back to its previous constant state Trasaction Lock The DBMS uses a transaction lock to keep track of all transaction to update the DB. Information stored in this lock’ll be used by the DBMS for recovery requirements though the roll back statements. 1- A transaction lock stores a record for beginning of transaction 2- Each transaction component 3- The pending of the transaction TRA TRA PRE TRA NEXT OPERA TABLES POW ATTR BE FOR AFTER ID NO TRA TION ID IBUTES Value VALUE 341 354 361 365
101 101 101 101
Null 341 352 303
352 363 365 Null
Start Uptate Update Commit
Start Product 1558 Customer 1001 End of T/A
Product 25 Customer 525
23 615
Concurrency Control The condition of the simultaneous execution of transaction in multiuser DB system is known as concurrency control. The objective of concurrency control is to ensure the serializability of transaction in a multiuser DB environment conairrency control is important because the simultanius execution of trasaction over shared DB can create several integrity & consistency problems. The 3 main problems are last update Uncommitted dependency Inconsistance retrieval. Last update Most common problem that is encounted where multiple users attempt to update DB without proper concurrency control. Transation A Check balance (10,000) Take (2000) Check balance (8000)
Transaction B 10.00 10.05 Check balance (10,000) 10,10 10,20 Take (7000) 10,30 10,40
The effect of transation is lost due to interference during the transation. Uncommitted dependency problem Uncommitted dependency problem arise If 1 transation is allowed to retrieve a record that has been update by another transaction but has not being committed by other transaction. TransationA (Check balance (500)
Transation B 10,00 10.05 transafer 200 10.10 10,15 10,20
Take (1000) Check balance (200)
Transaction A depends on an uncommitted transaction at 10,05 there fore the end result transaction A is incorrect because an uncommitted value may be changed due to some problem in the other transactions. Inconsistency Analyst (inconsistence refrieval) Inconsistency retrieval occur when transaction calculate some summary functions ahile other transactions are updation the data. The problem is that the transaction might read some data before they are changed & another data after they have changed this will result some inconsistency data. Trasaction A Sam = 0
Transation B 10,30
Open A C1 (Sum = sum +500 Open AC 2 (Sum = Sum +100)
10,30 10,36 transfer 50 from AC2 to AC 3
Open AC 3 (Sum = Sum + 200) AC1
AC2
50
100
AC3 250
Trasaction A should end up with an answer as 300 but the produced answer is 35. this is due to transaction A has considered an inconsistent situation of the DB. There fore the transaction has performed an inconsistent analyst. Concurrency control with locking methods Locking mechanisms are the common types of any data that is retrieved by a user for updating must be locked or denied to other users until the update is completed. Types of locks Shared lock (slock/ Read Lock) A transaction should place a shared lock on record who it’ll only read that record. Shared lock allows other transactions to read a record plaung an slock a record prevents another user from plaung an exolusn lock (updating, write) on the record. A
S
B
Exclusne lock (Write/ Lock) Exculusive lock prevents another transaction from reading & there fore updating a record until it is unlocked. A transaction should place an exlusive lock on a record when it is above to update that record. A Take B Chek Money money Data Read Data lock When 2 or more transaction have locked a common resource & each waits for other to unlock that resource. Database security DB security refer to a process of protecting a ab against accidental or international losses or misuse. A database administrator is responsible for the security of the db. The follwing are the different types of threats available for the db. 1- possibilities of the it & frauds – the data related to an organization may be illegally extracted without any permission by a hacker.
2- Possibility of loss of confidentiality an unauthorises person or a hacker try to access the corporate date available on the db. 3- Loss of Privaoy – An organization keeps data related to individual persons & viewing The personal data by others,ll lead to loss of privay. 4- Loss of Integrity – The integrity may be lost due to not implementing the interity rules properly due to hardware or software problems. 5- Loss of availability – The availability of the database may be lost due to different reasons such as matural disasters floads, fire, bomb attack, virus attack. The DBA is responsible for database security they secuirity measures on database can be defined in 2 different ways Computer based measurements Non computer based. Computer based Measures Haring an authorization on the db system by the operating system on which it is running (username, password os + DB) Having an authorization strategy to grant privell ages to the userson the DB tables. - creating views on db tables & giving access to user to those views. -
taking backups of data in DB maintaining done on the db or to trace any changes
-
Applying ercription strategy on more important data.
Non computer based measures. - sperate the duties of the users on the D, - Destroying the oopies of data without discarding them. - Use the fire – pro of storage mechanisms for storing lockeyp oppies - Apply physical access a secure environment. - Eg- locking , Assigind asecirity quard Database Administration It is a technical function that is responsible for physical DB design & for decurity enforcement, DB performance, backup & recovery In a large database system, DBA functions is carried out bythe DB administration or DBA team The DBA have the following core responsibilities. Administration of the DB The main activities of DBA in administrating the database the are physical design of the implementation. -
Data standards & documentation Monitering data usage & tuning DB structures.
-
Data Archiving
-
Data backup & recovery
Administration of the DBMS The following are the main activities of a DBMS - Installation of the DBMS - Configuring the system - Monitering the DBMS usage - Turing the DBMS The DB environment Administration The following activities are invoived with the DB environment administration. -
Access control 1:- This is dealine with or ceting users, assigning passwords, gran giving access to DBMS facilities Impact Assessment Privancy, security, Integrity control Education & training.