File and Database Design Data modeling with Entity Relationship Diagrams 2. File Access and Organization 1.
CSC 2202 Systems Analysis and Design
1
File and Database Design Objectives Define entity, file, record, and attribute and discuss the various types of keys Explain the concept of entity relationships diagram Define cardinality and its notation. Explain normalization Describe various types of files, including master, transaction, table, work, history, and security. CSC 2202 Systems Analysis and Design
2
File and Database Design Objectives Evaluate methods of file organization, including sequential, direct, and indexed Calculate file sizes and storage requirements Discuss file and database control measures
CSC 2202 Systems Analysis and Design
3
Introduction Data terminology and concepts Relationships among data objects Entity-relationship diagrams (ERDs) Normalization File organization and access methods File types File media, sizing, and controls CSC 2202 Systems Analysis and Design
4
Data Modeling with Entity Relationship Diagrams
CSC 2202 Systems Analysis and Design
5
Key Definitions Data model A formal way of representing the data that are used and created by a business system Shows the people, places and things about which data is captured and the relationships among them.
Logical data model
shows the organization of data without indicating how it is stored, created, or manipulated. CSC 2202 Systems Analysis and Design
6
Key Definitions Physical data model shows
how the data will actually be stored in databases or files.
Normalization is the process analysts use to validate data models. Data models should balance with process models CSC 2202 Systems Analysis and Design
7
What Is an ERD? A picture showing the information created, stored, and used by a business system. Entities generally represent similar kinds of information Lines drawn between entities show relationships among the data High level business rules are also shown
CSC 2202 Systems Analysis and Design
8
Using the ERD to Show Business Rules Business rules are constraints that are followed when the system is in operation. ERD symbols can show when one instance of an entity must exist for an instance of another to exist A
doctor must exist before appointments the doctor can be made CSC 2202 Systems Analysis and Design
9
Using the ERD to Show Business Rules ERD symbols can show when one instance of an entity can be related to only one or many instances of another entity
One doctor can have many patients; each patient may have only one primary doctor
ERD symbols show when the existence of an entity instance is optional for a related entity instance
A patient may or may not have insurance coverage CSC 2202 Systems Analysis and Design
10
An ERD Example
CSC 2202 Systems Analysis and Design
Entity A person, place, event, or thing about which data is collected Must be multiple occurrences to be an entity
Example: If a firm has only one warehouse, the warehouse is not an entity. However, if the firm has several warehouses, the warehouse could be an entity if the firm wants to store data about each warehouse instance.
CSC 2202 Systems Analysis and Design
12
Entities and Instances
CSC 2202 Systems Analysis and Design
13
Attributes Information captured about an entity Only those used by the organization should be included in the model Attribute names are nouns Sometimes entity name is added at the beginning of the attribute name for clarity CSC 2202 Systems Analysis and Design
14
Identifiers One or more attributes can serve as the entity identifier, uniquely identifying each entity instance Concatenated identifier consists of several attributes An identifier may be ‘artificial,’ such as creating an ID number Identifiers may not be developed until the Design Phase CSC 2202 Systems Analysis and Design
15
Identifier Types
CSC 2202 Systems Analysis and Design
16
Relationships Associations between entities The first entity in the relationship is the parent entity; the second entity in the relationship is the child entity Relationships should have active verb names Relationships go in both directions CSC 2202 Systems Analysis and Design
17
Cardinality Cardinality refers
to the number of times instances in one entity can be related to instances in another entity One instance in an entity refers to one and only one instance in the related entity (1:1) One instance in an entity refers to one or more instances in the related entity (1:N) One or more instances in an entity refer to one or more instances in the related entity (M:N)
CSC 2202 Systems Analysis and Design
18
Modality Modality Refers
to whether or not an instance of a child entity can exist without a related instance in the parent entity Not
Null means that an instance in the related entity must exist for an instance in another entity to be valid Null means that no instance in the related entity is necessary for an instance in another entity to be valid CSC 2202 Systems Analysis and Design
19
The Data Dictionary and Metadata Metadata is information stored about components of the data model Metadata is stored in the data dictionary so it can be shared by developers and users throughout the SDLC A complete, shareable data dictionary helps improve the quality of the system under development CSC 2202 Systems Analysis and Design
20
Metadata Examples An Entity can be described by:
An Attribute can be described by:
A Relationship can be described by:
Name Definition Special Notes User Contact Analyst Contact
Name Description Alias Sample Values Acceptable Values Format Type Special Notes
Verb phrase Parent entity Child entity Definition Cardinality Modality Notes
CSC 2202 Systems Analysis and Design
21
CREATING AN ENTITYRELATIONSHIP DIAGRAM
CSC 2202 Systems Analysis and Design
22
ERD Basics Drawing the ERD is an iterative process of trial and revision ERDs can become quite complex
CSC 2202 Systems Analysis and Design
23
Steps in Building ERDs Identify the entities Add appropriate attributes for each entity Draw the relationships that connect associated entities
CSC 2202 Systems Analysis and Design
24
Identify the Entities Identify major categories of information If available, check the process models for data stores, external entities, and data flows Check the major inputs and outputs from the use cases
Verify that there is more than one instance of the entity that occurs in the system
CSC 2202 Systems Analysis and Design
25
Add Appropriate Attributes Identify attributes of the entity that are relevant to the system under development Check the process model repository entries for details on data flows and data stores Check the data requirements of the requirements definition Interview knowledgeable users Perform document analysis on existing forms and reports
Select the entity’s identifier CSC 2202 Systems Analysis and Design
26
Draw the Relationships Start with an entity and identify all entities with which it shares relationships Describe the relationship with the appropriate verb phrase Determine the cardinality and modality by discussing the business rules with knowledgeable users
CSC 2202 Systems Analysis and Design
27
ERD Building Tips Data stores of the DFD should correspond to entities Only include entities with more than one instance of information Don’t include entities associated with implementation of the system (they will be added later) CSC 2202 Systems Analysis and Design
28
Data Terminology and Concepts
Data design includes Entities Fields Records Files Keys
CSC 2202 Systems Analysis and Design
29
Data Terminology and Concepts
Definitions Entity:
a person, place, thing, or event for which data is collected and maintained
Field
(attribute): a single characteristic or fact about an entity
Record:
a collection of fields that describes one instance of an entity
File:
a set of records that contains data about a specific entity CSC 2202 Systems Analysis and Design
30
Data Terminology and Concepts
Definitions Key
field: a field used to locate, retrieve, or identify a specific record
Primary
key: a key that uniquely identifies each record
CSC 2202 Systems Analysis and Design
31
Data Terminology and Concepts
Key fields Primary
keys
A
field or combination of fields that uniquely and minimally identifies each member of an entity A primary key composed of more than one field is called a multivalued key Candidate
keys
Any
field that could serve as primary key Any field that is not a primary key or candidate key is called a nonkey field CSC 2202 Systems Analysis and Design
32
Data Terminology and Concepts
Key fields Foreign A
keys
field in one file that matches a primary key value in another file Example: the advisor number is a foreign key in the STUDENT file that matches a primary key value in the ADVISOR file A foreign key need not be unique A combination of two or more foreign keys can form a unique primary key value Referential integrity ensures that a foreign key value cannot be entered unless it matches a primary key value in another file CSC 2202 Systems Analysis and Design
33
Data Terminology and Concepts
Key fields
Secondary
keys
A
field or combination of fields that can be used to access or retrieve records Secondary keys do not need to be unique
CSC 2202 Systems Analysis and Design
34
The goal of data Modeling To determine the required data contents of the main computer files in the new system OR (more likely) to determine the contents of data base tables i.e. to eliminate redundancy Data model forms the basis of the physical storage CSC 2202 Systems Analysis and Design
35
Entity-Relationship Diagrams Entity-relationship diagrams (ERDs) An
ERD is a graphical model that shows relationships among system entities Each entity is a rectangle, labeled with a noun Each relationship is a diamond, labeled with a verb
CSC 2202 Systems Analysis and Design
36
Entity-Relationship Diagrams Entity-relationship diagrams (ERDs) Types
of relationships
One-to-one
(1:1) One-to-many (1:M) Many-to-many (M:N) A
full ERD shows all system relationships
CSC 2202 Systems Analysis and Design
37
The need for a data model Relationships and data structures in the system may be very complex
Need to examine them independently of the processing.
The entity-relationship data model was first proposed by P. Chen (1976) - it has become widely used in database design. The resulting E-R diagram provides a graphical notation for Modeling the data CSC 2202 Systems Analysis and Design
38
Entity-relationship diagram components There are two main components in an E-R diagram: entities
(object types) relationships.
Entity is an object about which you want to store data Entities are represented by a rectangle CSC 2202 Systems Analysis and Design
39
Examples of ER Diagrams
CSC 2202 Systems Analysis and Design
40
Examples of ER Diagrams (cont)
CSC 2202 Systems Analysis and Design
41
Data Relationships and ERDs One-to-one (1:1) relationship Exists
when exactly one of the second entity occurs for each instance of the first entity Examples One
office manager heads one office One vehicle ID number is assigned to one vehicle One driver drives one delivery truck One faculty member is chairperson of one department
CSC 2202 Systems Analysis and Design
42
SC Fig 8.5 (1:1)
CSC 2202 Systems Analysis and Design
43
Data Relationships and ERDs One-to-many (1:M) relationship Exists
when one occurrence of the first entity can be related to many occurrences of the second entity, but each occurrence of the second entity can be associated with only one occurrence of the first entity- Examples: One
individual owns many automobiles One customer places many orders One department employs many employees One faculty advisor advises many students CSC 2202 Systems Analysis and Design
44
SC Fig 8.6 (1:M)
CSC 2202 Systems Analysis and Design
45
Data Relationships and ERDs Many-to-many (M:N) relationship Exists
when one instance of the first entity can be related to many instances of the second entity, and one instance of the second entity can be related to many instances of the first Examples A
student enrolls in one or more classes, and each class has one or more students registered A passenger buys tickets for one or more flights, and each flight has one or more passengers An order lists one or more products, and each product is listed on one or more orders CSC 2202 Systems Analysis and Design
46
(M:N) *
CSC 2202 Systems Analysis and Design
47
Data Relationships and ERDs A full ERD shows all system relationships Examples A
sales rep serves one or more customers, but each customer has only one sales rep A customer places one or more orders, but each order has only one customer An order lists one or more products, and each product can be listed in one or more orders A warehouse stores one or more products, and each product can be stored in one or more warehouses CSC 2202 Systems Analysis and Design
48
SC Fig 8.8 Example
CSC 2202 Systems Analysis and Design
49
Data Relationships and ERDs Cardinality Describes
how instances of one entity relate
to another Mandatory vs. optional relationships Crow’s foot notation is one method of showing cardinality Most CASE products support the drawing of ERDs
CSC 2202 Systems Analysis and Design
50
SC Fig 8.9 Crow’s foot notation
CSC 2202 Systems Analysis and Design
51
ERD from CASE tool
CSC 2202 Systems Analysis and Design
52
Example E-R diagrams Assignment management system example Alternative method for showing relationships ... replace single and double headed arrows, or 0, 1, “crows foot” with the labels: (1) for one (M) or (N) for many
CSC 2202 Systems Analysis and Design
53
Example E-R diagram
CSC 2202 Systems Analysis and Design
54
Data Relationships and ER Diagrams Creating an ERD 1. Identify the entities 2. Determine all significant events or activities for two or more entities 3. Analyze the nature of the interaction 4. Draw the ERD
CSC 2202 Systems Analysis and Design
55
Connection between DFD and E-R Consider the following example Inventory data store on DFD E-R diagram showing how the Inventory entity is related to other entities Same E-R diagram showing attributes (data elements)
CSC 2202 Systems Analysis and Design
56
VALIDATING AN ERD
CSC 2202 Systems Analysis and Design
57
Design Guidelines Best practices rather than rules Entities should have many occurrences Avoid unnecessary attributes Clearly label all components Apply correct cardinality and modality Break attributes into lowest level needed Labels should reflect common business terms Assumptions should be clearly stated CSC 2202 Systems Analysis and Design
58
Normalization Technique used to validate data models Series of rules applied to logical data model to improve its organization Three normalization rules are common CSC 2202 Systems Analysis and Design
59
Normalization Steps
CSC 2202 Systems Analysis and Design
60
Unnormalized Entity Begin with an entity from the logical data model
CSC 2202 Systems Analysis and Design
61
First Normal Form (1NF) Look for repeating groups of attributes and remove them into separate entities
CSC 2202 Systems Analysis and Design
62
Second Normal Form (2NF)
entity has a concatenated identifier, look for attributes that nd only on part of the identifier. If found, remove to new ent
CSC 2202 Systems Analysis and Design
63
Third Normal Form (3NF)
or attributes that depend only on another nonidentifying att nd, remove to new entity. Also remove any calculated attribu
CSC 2202 Systems Analysis and Design
64
ERD after normalisation *
CSC 2202 Systems Analysis and Design
65
Balancing ERDs with DFDs All analysis activities are interrelated Process models contain two data components Data flows and data stores The DFD data components need to balance the ERD’s data stores (entities) and data elements (attributes) Many CASE tools provide features to check for imbalance Check that all data stores and elements correspond between models
Data that is not used is unnecessary Data that has been omitted results in an incomplete system
Do not follow thoughtlessly -- check that the models make sense! CSC 2202 Systems Analysis and Design
66
Steps in Database Design Four steps in database design 1. Create the initial ERD 2. Assign all data elements to entities 3. Create 3NF designs for all records, taking care to identify all primary, secondary, and foreign keys 4. Verify all data dictionary entries
CSC 2202 Systems Analysis and Design
67
Database Management Potential problems in a file processing environment Data
redundancy Inconsistent data Inefficiency
Database management design A
database is a structure that can store data about many entities and the relationships among them CSC 2202 Systems Analysis and Design
68
Database Management Elements of database management systems A
database management system (DBMS) is used to create, access, and control a database Data definition language (DDL) Data manipulation language (DML) Query language Data dictionary Utility programs CSC 2202 Systems Analysis and Design
69
Database Management Characteristics of database management A
DBMS is more powerful, complex, and expensive than a file processing system Effective security, backup, and recovery procedures are essential Data is a company-wide resource Data mining enables users to extract information from anywhere in the organization CSC 2202 Systems Analysis and Design
70
File Access and Organization Types of files A file is a set of logical records that contains data about an entity Master
files Table files Transaction files Work files Security files History files CSC 2202 Systems Analysis and Design
71
File Access The file access method determines how programs read and write records Two main methods are used Sequential
access method
Program
reads all records in order until the end of the file is reached
Random
access method
Program
can read any logical record without having to access all preceding records CSC 2202 Systems Analysis and Design
72
File Media Types Technology has enabled many alternatives, each with advantages and disadvantages Magnetic
tape Floppy disk Hard disk (fixed disk, direct access storage device or DASD) RAID (Redundant Array of Inexpensive Disks)
CSC 2202 Systems Analysis and Design
73
File Media Types Technology has enabled many alternatives, each with advantages and disadvantages Removable
disk cartridge Optical disc (CD-ROM) WORM Mass storage system
CSC 2202 Systems Analysis and Design
74
File and Database Control Control must include measures to ensure that data is correct, complete, and secure Passwords Encryption Backup
and recovery procedures Audit trail files Audit fields
CSC 2202 Systems Analysis and Design
75
Summary The ERD is the most common technique for drawing data models. The building blocks of the ERD are: Entities describe people, places, or things Attributes capture information about the entity Relationships associate data across entities
The ERD must be balanced with the DFD.
CSC 2202 Systems Analysis and Design
76
Summary (cont) E-R Modeling is used extensively in database design. Data analysis is often considered more important than functional analysis - in a Computer Based Information System it is easier to change a program (which represents a process) than to change the underlying data structures (i.e. file or database) CSC 2202 Systems Analysis and Design
77
Review Questions 1. What is an entity-relationship diagram and what is the role of this diagram in the systems development process? 2. “Data is a valuable organisational resource and needs to carefully analysed and stored commensurate with organisational needs.” Discuss this statement. 3. What is normalisation and what role does it play in the data design process? 4. Describe the attributes of a data structure which is in 3rd Normal Form (3NF).
CSC 2202 Systems Analysis and Design
78