Overview Of Rdbms Technology

  • Uploaded by: api-3706175
  • 0
  • 0
  • November 2019
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Overview Of Rdbms Technology as PDF for free.

More details

  • Words: 3,979
  • Pages: 72
Overview of Relational Database Technology Hanu [email protected]

Introductions

2

Agenda • • • • • • • • •

Introduction to files systems Introduction to Access methods--VSAM Introduction Data Base Systems- Architecture Introduction to Hierarchical databases--IMS Introduction to Network Database –IDMS Introduction to Relational Databases Introduction to OLTP New trends in DB technology Basics of Data Warehousing

3

Objectives • • • • •

Evolution of Database Technology Limitations of Legacy Access mechanism Limitations of Hierarchy and Network Databases Emergence of Relational database management system Latest trends in Database technology – OODBMS – OORDBMS – XML Integration

4

What this course does not cover • • • •

Data Modelling OO Design SQL Syntax Commercial database like – – – –



DB2 Oracle SQL Server 2000 and Sybase

Design of Data warehousing or Data Mining system

5

File Based Data Management •

Flat file systems are first attempt of computerization of manual book keeping system



Retrieval of data was possible only by sequential reading Updating and deleting the existing record was almost impossible



The only way to delete Sequential file records is to create a new file which does not contain them.



The only way to update records in a Sequential File is to create a new file which contains the updated records

6

Disadvantages of File based system • • • •

Data Redundancy - the same data might be stored in different places Poor Data Control - redundant data might be slightly different example Hanu’s data may be stored in Telephone, Payana and PSWeb Inability to Easily Manipulate Data - it was a tedious and error prone activity to modify files by hand Cryptic Work Flows - accessing the data could take excessive programming effort and was too difficult for real-users

7

VSAM Based Systems • • • • • •

Designed and Developed by IBM in early 60s First time introduced concept of Unique Key To be able to locate a given record, based on its key, and fetch it with minimal I/O (ideally a single read) Can define Secondary keys Made up of Multiple Control Areas (CA) Each control area is made up of Control Intervals and Free space

8

Disadvantages of VSAM • • • • • •

Complex Structure and access mechanism No querying facility Security Issues No concept of Referencing keys Application dependent Redundancy of the data

9

Hierarchical Database - IMS • • • •

Designed in Mid of 60s again by IBM Based on two tier client server architecture VENDOR Looks data only as ‘Hierarchical’ 1 Accessing child element only through parent node ITEM3

VENDOR ITEM LOCATION

ITEM2 ITEM1

L L OC3 L OC2 OC1 10

L OC1

L OC3

Disadvantages of IMS • • • • • • •

Accessing child nodes only through parent node Child record can not be inserted without a parent One child record can have only one parent record No querying facility No referential and constraints concept Redundancy of the data Very cryptic Macros

11

Network Database - IDMS • • • • •

Designed by Conference on Data Systems Languages (CODASYL) in late 60s Introduced to overcome Hierarchical DB limitations Data elements are linked through only pointers Eliminated redundancy completely Super set of Hierarchical Database with child and multi parent relationship

12

Disadvantages of IDMS • • •

Difficult to access the system using cumbersome pointer concept It was useful for Programmers than real users Difficult to represent many to many relationship

13

Relational Data Base Management System (RDBMS) •

• • • • • • • •

The Relational Model developed out of the work done by Dr. E. F. Codd at IBM in the early 70s who was looking for ways to solve the problems with the existing models First time introduced in his famous paper “A Relational Model of Data for Large Shared Databanks “ Based on Mathematical model of Relational Algebra At the core of the relational model is the concept of a table (also called a relation) in which all data is stored Each table is made up of records (horizontal rows also known as tuples) and fields (vertical columns also known as attributes) Data is separated from application – No more application dependent/centric In the relational model, operations that manipulate data do so on the basis of the data values themselves. Extremely easy meta data management Query Language Interface

14

Application Programs using DBMS Services

Application Programs

DBMS

File System Storage 15

Example of RDBMS Table SNO SNAME S1 S2 S3 S4 S5

Smith Jones Blake Clark Adams

STATUS 20 10 30 20 30

CITY London Paris Paris London Athens

16

ER modeling •

ER modeling : A graphical technique for understanding and organizing the data independent of the actual database implementation



Entity: Any thing that may have an independent existence and about which we intend to collect data. Also known as Entity type.



Entity instance: a particular member of the entity type e.g. a particular student



Attributes: Properties/characteristics that describe entities



Relationships: Associations between entities

17

Steps in ER Modeling •

Identify the Entities



Find relationships



Identify the key attributes for every Entity



Identify other relevant attributes



Draw complete E-R diagram with all attributes including Primary Key



Review your results with your Business users

18

Case Study – ER Model For a college DB Assumptions : • • • • • • • • • •

A college contains many departments Each department can offer any number of courses Many instructors can work in a department An instructor can work only in one department For each department there is a Head An instructor can be head of only one department Each instructor can take any number of courses A course can be taken by only one instructor A student can enroll for any number of courses Each course can have any number of students

19

Step 1 : Identify the Entities : • • • •

DEPARTMENT STUDENT COURSE INSTRUCTOR

Step 2 : Find the relationships •

One course is enrolled by multiple students and one student enrolls for multiple courses, hence the cardinality between course and student is Many to Many.



The department offers many courses and each course belongs to only one department, hence the cardinality between department and course is One to Many.



One department has multiple instructors and one instructor belongs to one and only one department , hence the cardinality between department and instructor is one to Many.



Each department there is a “Head of department” and one instructor is “Head of department “,hence the cardinality is one to one .



One course is taught by only one instructor, but the instructor teaches many courses, hence the cardinality between course and instructor is many to one.

20

Step 3: Identify the key attributes • • • •

Deptname is the key attribute for the Entity “Department”, as it identifies the Department uniquely. Course# (CourseId) is the key attribute for “Course” Entity. Student# (Student Number) is the key attribute for “Student” Entity. Instructor Name is the key attribute for “Instructor” Entity.

Step 4: Identify other relevant attributes For the department entity, the relevant attribute is location For course entity, course name,duration,prerequisite For instructor entity, room#, telephone# For student entity, student name, date of birth

21

Step 5: Draw complete E-R diagram with all attributes including Primary Key Department Name

Location

Department

Pre Requisite

1

1

1 Headed by

Offers

Has

Course# 1

N Duration

Course Name

N Course

N

Is taught by

1 Instructor

Instructor Name

Enrolled by

Room# Telephone#

M Student Date of Birth

Student#

N

Student Name

22

What is Normalization? •

Database designed based on the E-R model may have some amount of – Inconsistency – Uncertainty – Redundancy

To eliminate these draw backs some refinement has to be done on the database. – Refinement process is called Normalization – Defined as a step-by-step process of decomposing a complex relation into a simple and stable data structure. – The formal process that can be followed to achieve a good database design – Also used to check that an existing design is of good quality – The different stages of normalization are known as “normal forms” – To accomplish normalization we need to understand the concept of Functional Dependencies.

23

Need for Normalization Student_Course_Result Table

Student_Details

Course_Details

Result_Details

101

Davis

11/4/1986

M4

Applied Mathematics

Basic Mathematics

7

11/11/2004

82

A

102

Daniel

11/6/1987

M4

Applied Mathematics

Basic Mathematics

7

11/11/2004

62

C

101

Davis

11/4/1986

H6

American History

4

11/22/2004

79

B

103

Sandra

10/2/1988

C3

Bio Chemistry

11

11/16/2004

65

B

104

Evelyn

2/22/1986

B3

Botany

8

11/26/2004

77

B

102

Daniel

11/6/1987

P3

Nuclear Physics

Basic Physics

13

11/12/2004

68

B

105

Susan

8/31/1985

P3

Nuclear Physics

Basic Physics

13

11/12/2004

89

A

103

Sandra

10/2/1988

B4

Zoology

5

11/27/2004

54

D

105

Susan

8/31/1985

H6

American History

4

11/22/2004

87

A

104

Evelyn

2/22/1986

M4

Applied Mathematics

7

11/11/2004

65

B

Basic Chemistry

Basic Mathematics

Insert Anomaly Delete Anomaly Update Anomaly Data Duplication

24

Functional dependency •

In a given relation R, X and Y are attributes. Attribute Y is functionally dependent on attribute X if each value of X determines EXACTLY ONE value of Y, which is represented as X -> Y (X can be composite in nature).



We say here “x determines y” or “y is functionally dependent on x” X→Y does not imply Y→X



If the value of an attribute “Marks” is known then the value of an attribute “Grade” is determined since Marks→ Grade



Types of functional dependencies: – Full Functional dependency – Partial Functional dependency – Transitive dependency

25

Functional Dependencies Consider the following Relation REPORT (STUDENT#,COURSE#, CourseName, IName, Room#, Marks, Grade) • • • • • • •

STUDENT# - Student Number COURSE# - Course Number CourseName - Course Name IName - Name of the Instructor who delivered the course Room# - Room number which is assigned to respective Instructor Marks - Scored in Course COURSE# by Student STUDENT# Grade - obtained by Student STUDENT# in Course COURSE#

26

Functional Dependencies- From the previous example •

STUDENT# COURSE#  Marks



COURSE#  CourseName,



COURSE#  IName (Assuming one course is taught by one and only one Instructor)



IName  Room# (Assuming each Instructor has his/her own and nonshared room)



Marks  Grade

27

Full dependencies X and Y are attributes. X Functionally determines Y Note: Subset of X should not functionally determine Y

Student#

Marks Course#

28

Partial dependencies X and Y are attributes. Attribute Y is partially dependent on the attribute X only if it is dependent on a sub-set of attribute X.

Student#

Course#

CourseName

 

IName

Room#

29

Transitive dependencies X Y and Z are three attributes. X -> Y Y-> Z => X -> Z

Course#

IName

Room#

30

First normal form: 1NF •

A relation schema is in 1NF : –

if and only if all the attributes of the relation R are atomic in nature.



Atomic: the smallest level to which data may be broken down and remain meaningful

31

Student_Course_Result Table Student_Details 101

Davis

11/4/1986

Course_Details

Results

M4

Applied Mathematics

Basic Mathematics

7

11/11/2004

82

A

Basic Mathematics

7

11/11/2004

62

C

4

11/22/2004

79

B

11

11/16/2004

65

B

8

11/26/2004

77

B

102

Daniel

11/6/1987

M4

Applied Mathematics

101

Davis

11/4/1986

H6

American History

103

Sandra

10/2/1988

C3

Bio Chemistry

104

Evelyn

2/22/1986

B3

Botany

102

Daniel

11/6/1987

P3

Nuclear Physics

Basic Physics

13

11/12/2004

68

B

105

Susan

8/31/1985

P3

Nuclear Physics

Basic Physics

13

11/12/2004

89

A

103

Sandra

10/2/1988

B4

Zoology

5

11/27/2004

54

D

105

Susan

8/31/1985

H6

American History

4

11/22/2004

87

A

104

Evelyn

2/22/1986

M4

Applied Mathematics

7

11/11/2004

65

B

Basic Chemistry

Basic Mathematics

32

Table in 1NF Student#

Student Name

Dateof Birth

Student_Course_Result Table Cour

CourseName

Pre Requisite

s e #

Dura

DateOf Exam

Marks

Grade

InDa

101

Davis

04-Nov-1986

M4

Applied Mathematics

Basic Mathematics

7

11-Nov-2004

82

A

102

Daniel

06-Nov-1986

M4

Applied Mathematics

Basic Mathematics

7

11-Nov-2004

62

C

101

Davis

04-Nov-1986

H6

American History

4

22-Nov-2004

79

B

103

Sandra

02-Oct-1988

C3

Bio Chemistry

11

16-Nov-2004

65

B

104

Evelyn

22-Feb-1986

B3

Botany

8

26-Nov-2004

77

B

102

Daniel

06-Nov-1986

P3

Nuclear Physics

Basic Physics

13

12-Nov-2004

68

B

105

Susan

31-Aug-1985

P3

Nuclear Physics

Basic Physics

13

12-Nov-2004

89

A

103

Sandra

02-Oct-1988

B4

Zoology

5

27-Nov-2004

54

D

105

Susan

31-Aug-1985

H6

American History

4

22-Nov-2004

87

A

104

Evelyn

22-Feb-1986

M4

Applied Mathematics

7

11-Nov-2004

65

B

Basic Chemistry

Basic Mathematics

33

Second normal form: 2NF •

A Relation is said to be in Second Normal Form if and only if : – It is in the First normal form, and – No partial dependency exists between non-key attributes and key attributes.

An attribute of a relation R that belongs to any key of R is said to be a prime attribute and that which doesn’t is a non-prime attribute

34

Second Normal Form • • • •

STUDENT# is key attribute for Student, COURSE# is key attribute for Course STUDENT# COURSE# together form the composite key attributes for Results relationship. Other attributes like StudentName (Student Name), DateofBirth, CourseName, PreRequisite, DurationInDays, DateofExam, Marks and Grade are non-key attributes.

To make this table 2NF compliant, we have to remove all the partial dependencies. Student #, Course# -> Marks, Grade Student# -> StudentName, DOB, Course# -> CourseName, Prerequiste, DurationInDays Course# -> Date of Exam

35

Second Normal Form - Tables in 2 NF STUDENT TABLE Student#

StudentName

DateofBirth

101

Davis

04-Nov-1986

102

Daniel

06-Nov-1987

COURSE TABLE Course#

Course Name

M1

Basic Mathematics

Pre Requisite

Duration InDays

11

103

Sandra

02-Oct-1988

M4

Applied Mathematics

104

Evelyn

22-Feb-1986

H6

American History

4

105

Susan

31-Aug-1985

C1

Basic Chemistry

5

106

Mike

04-Feb-1987

C3

Bio Chemistry

107

Juliet

09-Nov-1986

B3

Botany

8

108

Tom

07-Oct-1986

P1

Basic Physics

8

109

Catherine

06-Jun-1984

P3

Nuclear Physics

B4

Zoology 36

M1

C1

P1

7

11

13 5

Second Normal form – Tables in 2 NF Student#

Course#

Marks

Grade

101 M4

82 A

102 M4

62 C

101 H6

79 B

103 C3

65 B

104 B3

77 B

102 P3

68 B

105 P3

89 A

103 B4

54 D

105 H6

87 A

104 M4

65 B 37

Second Normal form – Tables in 2 NF Exam_Date Table Course#

DateOfExam

M4

11-Nov-04

H6

22-Nov-04

C3

16-Nov-04

B3

26-Nov-04

P3

12-Nov-04

B4

27-Nov-04

38

Third normal form:3 NF A relation R is said to be in the Third Normal Form (3NF) if and only if - It is in 2NF and

- No transitive dependency exists between non-key attributes and key attributes.

• STUDENT# and COURSE# are the key attributes. • All other attributes, except grade are non-partially, non-transitively dependent on key attributes.



Student#, Course# - > Marks

• Marks -> Grade

39

3NF Tables Student#

Course#

Marks

101 M4

82

102 M4

62

101 H6

79

103 C3

65

104 B3

77

102 P3

68

105 P3

89

103 B4

54

105 H6

87

104 M4

65 40

Third Normal Form – Tables in 3rd NF MARKSGRADE TABLE UpperBound

LowerBound

Grade

100

95 A+

94

85 A

84

70 B

69

65 B-

64

55 C

54

45 D

44

0 E 41

Boyce-Codd normal form - BCNF A relation is said to be in Boyce Codd Normal Form (BCNF) - if and only if all the determinants are candidate keys. BCNF relation is a strong 3NF, but not every 3NF relation is BCNF.

42

Consider this Result Table Student#

EmailID

Course#

Marks

101 [email protected]

M4

82

102 [email protected]

M4

62

101 [email protected]

H6

79

103 [email protected]

C3

65

104 [email protected]

B3

77

102 [email protected]

P3

68

105 [email protected]

P3

89

103 [email protected]

B4

54

105 [email protected]

H6

87

104 [email protected]

M4

65

43

BCNF S#

C#

Candidate Keys for the relation are - STUDENT# COURSE# and COURSE# EmailID Since Course # is overlapping, it is referred as Overlapping Candidate Key.

Valid Functional Dependendencies are Student# - > EmailID ( Non Key Determinant) EmailID - > Student# ( Non Key Determinant) Student#, Course#

- > Marks

Course# , EmailID - > Student#

44

EmailID

BCNF

 

STUDENT TABLE Student#

EmailID

101 [email protected] 102 [email protected]

103 [email protected] 104 [email protected] 105 [email protected]

45

BCNF Tables Student#

Course#

Marks

101 M4

82

102 M4

62

101 H6

79

103 C3

65

104 B3

77

102 P3

68

105 P3

89

103 B4

54

105 H6

87

104 M4

65 46

Merits of Normalization •

Normalization is based on a mathematical foundation.



Removes the redundancy to a greater extent. After 3NF, data redundancy is minimized to the extent of foreign keys.



Removes the anomalies present in INSERTs, UPDATEs and DELETEs.

47

Demerits of Normalization •

Data retrieval or SELECT operation performance will be severely affected.



Normalization might not always represent real world scenarios.

48

SQL - Background •

• • • • • •

Conceived in mid-1970’s as a database language for the relational model Developed by IBM First standardized in 1986 by ANSI Enhanced in 1989 Revised again in ‘92 Non Procedural language Number of commercial products

49

SQL Statements •

Data Definition Language (DDL) – CREATE TABLE – ALTER TABLE – DROP TABLE



Data Manipulation Language (DML) – – – –



SELECT INSERT UPDATE DELETE

Data Control Language (DCL) – GRANT – REVOKE

50

SQL - Some ANSI/ISO Keywords ALL AND AVG BETWEEN CHAR COMMIT COUNT CREATE DECIMAL DELETE

DISTINCT DROP FETCH GRANT GROUP BY HAVING IN INSERT MAX MIN NOT 51

NULL OR PRIVILEGE REFERENCES REVOKE SELECT SUM TABLE UPDATE VIEW WHERE

Commercial Products •

• • • •

• •

In 2004, the RDBMS market grew 10%, rising from just under $7.1 billion to nearly $7.8 billion in new license sales DB2 UDB – Market Leader Oracle – Fastest growing Database on Unix Boxes SQL Server 2000 – Leader in Windows Platform Teradata – Most efficient, Self tuning, Costliest DB for Data Warehousing application Sybase –May be target for acquisition MySQL – Open Source Database

Company

2004 Market Share%

2003-04 Growth%

IBM

34.1

5.8

Oracle

33.7

14.6

Microsoft

20

18

NCR Teradata

2.9

17.2

Sybase

2.3

0.5

52

On Line Transaction Processing (OLTP) System Handle • Several concurrent transactions from – – – –

• •

Spatially Distributed M/cs Execution of Instructions and Queries across LAN/WAN Geographically distributed processors Spatially Distributed Databases

Transaction is defined as logical unit of program execution that takes a system from one consistent state to another consistent state OLTP system should adhere to ACID Properties - Atomicity - Consistency - Isolation - Durability

53

State diagram of a transaction While executing

active l a n fi as e t h nt h ed r t e te Af tem xecu sta en e be

When normal execution can’t proceed

partially completed

failed After rollback and restoration to prev state

A suc fter com cessf ple ul tion

aborted

committed

54

Concurrency Vs Consistency in OLTP • • •

Concurrency and Consistency are inversely proportional to each other Multiple transactions accessing same resource simultaneously Problems associated with OLTP applications are – – – –

Lost update Dirty read Incorrect Summary Phantom records

55

Serialization Techniques • • •

Locking Time stamping Ensures consistency of the database while allowing concurrent access of the resources

56

Locking •

A lock is a variable associated with each data item in a database.



When updated by a transaction, DBMS locks the data item



serializability could be maintained by this.



Lock could be Shared or Exclusive Deadlock is most common problem with locking mechanism



57

Timestamping • • • •

Occurs when an older transaction tries to read a value that is written by a younger transaction. Or when an older transaction tries to modify(Write) a value already read or written by a younger transaction Both of these attempts signify that the older transaction was “too late” in performing the required operation Commercially not a viable option because of too much rollback

58

New Trends in DB Technology • • •

OODBMS OORDBMS XML Integration

59

OODBMS • • • •

Any user-defined data structures Any user-defined operations Any user-defined relationships Useful for – – – – –

Manufacturing Telecommunication CAD/CAM Multimedia products Aerospace and Flight simulations

60

Relationship in OODBMS • • • • •

One - Many Many - Many Is A Extends Whole-part

61

Commercial Packages • • • • • •

Objectivity Poet Jasmine Gemstone Itasca ObjectStore

More details log on to http://www.geocities.com/SiliconValley/2139/products.html

62

Limitations of OODBMS • • • •

procedural navigation No querying as it breaks encapsulation No mathematical foundation Not suitable for adhoc reporting system

63

OORDBMS • • • • • •

Marrying Relational and Object Oriented concepts Still data is stored in Relational manner Object wrapper for application Performance is the major concern Still under development stage Commercial Products – – – – –

Informix Universal Server (Illustra) ( Merged with IBM ) Oracle Oracle 10g IBM DB2 UDB UniSQL UniSQL/X Unisys OSMOS

64

XML in DB • •

Data-centric to Document-centric Simpler integration between Database and other tools like – – – –

• •

Middlewares EAI tools ERP tools Other Databases

Introduction of Native XML data type XML Query Language

65

What is OLAP or DW or BI? •

An organization’s success also depends on its ability to analyze data (through views and reports) and make intelligent decisions that potentially affect its future. Systems that facilitate such analyses are called On Line Analytical Processing (OLAP) systems or Data Warehousing System



Why not OLTP?





OLTP databases do not contain historical data



OLTP databases contain small subsets of organizational data



OLTP databases are heterogeneous in nature and geographically distributed systems

OLTP systems are – Fragmented –

Not integrated.



Difficult to access.



Disparate sources.



Disparate platforms.



Poor data quality.



Redundant data.



Difficult to understand

66

Data warehouse / Business Intelligence •

A Data Warehouse is a copy of the enterprise operational data, suitably modified to support the needs of analytical processes and stored outside the operational database.



According to Bill Inmon, known as the father of Data Warehousing, a data warehouse is a –

Subject oriented,



Integrated,



Time-variant,



Nonvolatile



Collection of data in support of management decisions.

67

Data warehouse architecture Data Warehouse Server (Tier 1)

OLAP Servers (Tier 2)

Clients (Tier 3)

e.g., MOLAP Semistructured Sources

Data Warehouse

Analysis serve

extract transform load refresh etc.

Query/Reporting

serve e.g., ROLAP

Operational DB’s

serve

Data Marts 68

Data Mining

Components of DW •

Extraction Transformation and Loading (ETL) – – – –



Informatica Power Center Data Stage AbInitio WebFOCUS

Data Warehouse – Teradata – DB2 UDB – Oracle 10g OLAP – Business Object – COGNOS – Hyperion – Power Analyzer



Data Mining – Intelligent Miner – Darwin – SAS Miner

69

Complementing Technology •

How many Infy shares sold yesterday in NASDAQ? What was the highest and lowest Price? – OLTP System



How Infy shares are doing in NASDAQ with respect to NSE India in last 5 Years? What’s the volume? P/E Ratio? Highest and Lowest Price? – DW System



What will be the Infy earnings in second quarter of next year? What will be the share price during that period? – Data Mining System

70

References • • • • • • • • • • • •

E&R VSAM Presentation E&R IMS Presentation E&R IDMS Presentation E&R RDBMS Presentation E&R OODBMS Presentation E&R OORDBMS Presentations E&R DW and BI Presentations http://mngktrmerweb/techportals/db/hanu_bank.htm www.oracle.com www.ibm.com www.mssqlserver.com www.sybase.com

71

Thank You

72

Related Documents

Technology Overview
June 2020 6
Technology Overview
May 2020 7
Rdbms
November 2019 19
Rdbms
July 2020 8
Rdbms
November 2019 13