Database Concepts
What is a Database
Personal address book in a Word document
Collection of Word documents
Collection of Excel Spreadsheets
Data collected, maintained, & used in airline reservation
Data used to support launch of a space shuttle Structured Set of Data
Models of Reality DML DATABASE SYSTEM
REALITY • structures • processes
DDL
DATABASE
Database is a model of structures of reality
Use of a database reflect processes of reality
Database system is a software system which supports definition & use of a database
DDL: Data Definition Language
DML: Data Manipulation Language
When does DBMS is used
Persistent Storage of Data Centralized Control of Data Control of Redundancy Control of Consistency & Integrity Multiple User Support Sharing of Data Data Independence Control of Access & Security Backup & Recovery
Data Modeling REALITY • structures • processes
DATABASE SYSTEM data modeling
MODEL
Model represents a perception of structures of reality Data modeling process is to fix a perception of structures of reality & represent this perception In data modeling process we select aspects & we abstract
Database Design purpose of database design is to create a database which
is a model of structures of reality
supports queries & updates
runs efficiently
Database Terminology
Data Models
Keys
Integrity
Triggers & Stored Procedures
Null Values
Surrogates
Normalization
Data Model A data model consists of notations for
expressing:
Data Structures Integrity Constraints Operations
Data Model - Data Structures All data models have notation for defining: entity types attribute types relationship types
FLIGHT-SCHEDULE
DEPT-AIRPORT AIRPORT-CODE
AIRLINE
WEEKDAY
101
delta
mo
156
101
atl
545
american
we
110
912
cph
912
sc&inavian
fr
450
545
lax
mo
231
242
usair
PRICE
FLIGHT#
FLIGHT#
Data Model Constraints Rules that cannot be expressed by Data Structures
Static constraints apply to database state Dynamic constraints apply to change of database state E.g., “All FLIGHT-SCHEDULE entities must have precisely one DEPT-AIRPORT relationship FLIGHT-SCHEDULE
DEPT-AIRPORT AIRPORT-CODE
AIRLINE
WEEKDAY
101
delta
mo
156
101
atl
545
american
we
110
912
cph
912
sc&inavian
fr
450
545
lax
mo
231
242
bos
242
usair
PRICE
FLIGHT#
FLIGHT#
Data Model Operations Operations support change & retrieval of data: insert FLIGHT-SCHEDULE(97, delta, tu, 258); insert DEPT-AIRPORT(97, atl); select FLIGHT#, WEEKDAY from FLIGHT-SCHEDULE where AIRLINE=‘delta’;
FLIGHT-SCHEDULE
DEPT-AIRPORT PRICE
FLIGHT#
AIRPORT-CODE
FLIGHT#
AIRLINE
WEEKDAY
101
delta
mo
156
101
atl
545
american
we
110
912
cph
912
sc&inavian
fr
450
545
lax
242
usair
mo
231
242
bos
97
delta
tu
258
97
atl
Keys Keys are uniqueness constraints
A key on FLIGHT# in FLIGHT-SCHEDULE will force all FLIGHT#’s to be unique in FLIGHT-SCHEDULE Consider following keys on DEPT-AIRPORT:
FLIGHT-SCHEDULE
DEPT-AIRPORT
FLIGHT#
AIRLINE
WEEKDAY
101
delta
mo
156
101
atl
545
american
we
110
912
cph
912
sc&inavian
fr
450
545
lax
mo
231
242
bos
242
usair
PRICE
FLIGHT#
AIRPORT-CODE
Integrity
Integrity: – does model reflect reality well? – is model without internal conflicts? a FLIGHT# in FLIGHT-SCHEDULE cannot be null because it models existence of an entity in real world a FLIGHT# in DEPT-AIRPORT must exist in FLIGHT-SCHEDULE because it doesn’t make sense for a non-existing FLIGHT-SCHEDULE entity to have a DEPT-AIRPORT DEPT-AIRPORT
FLIGHT-SCHEDULE
AIRPORT-CODE
AIRLINE
WEEKDAY
101
delta
mo
156
101
atl
545
american
we
110
912
cph
912
sc&inavian
fr
450
545
lax
mo
231
242
bos
242
usair
PRICE
FLIGHT#
FLIGHT#
Triggers & Stored Procedures
Triggers can be defined to enforce constraints on a database, e.g.,
DEFINE TRIGGER DELETE-FLIGHT-SCHEDULE ON DELETE FROM FLIGHT-SCHEDULE WHERE FLIGHT#=‘X’ ACTION DELETE FROM DEPT-AIRPORT WHERE FLIGHT#=‘X’;
FLIGHT-SCHEDULE
DEPT-AIRPORT
FLIGHT#
AIRLINE
WEEKDAY
101
delta
mo
156
101
atl
545
american
we
110
912
cph
912
sc&inavian
fr
450
545
lax
mo
231
242
bos
242
usair
PRICE
FLIGHT#
AIRPORT-CODE
Null Values CUSTOMER#
123-45-6789 234-56-7890 345-67-8901
NAME
MAIDEN NAME
Lisa Smith Lisa Jones George Foreman inapplicable unknown Mary Blake
DRAFT STATUS
inapplicable drafted inapplicable
Null-value unknown reflects that attribute does apply, but value is currently unknown. That’s ok! Null-value inapplicable indicates that attribute does not apply. That’s bad! Null-value inapplicable results from direct use of “catch all forms” in database design “Catch all forms” are ok in reality, but detrimental in database design
Surrogates reality name
custom#
customer customer custom# name addr
customer
addr
surrogate-based representation
surrogates are system-generated, unique, internal identifiers
DATA MODELS
ER-Model Hierarchical Model Relational Model Object-Oriented Model
ER Model airport
airport
name
code
visa required dept
domestic
international
time
flight
flight
dept airport
1 airport addr
city
zip
p
weekdays
flight
airport
schedule 1
street
n
arriv
n
airport
1
arriv instance
time
of customer#
date n
customer name
customer
n
reserva-
n
tion
seat#
flight instance
flight#
Hierarchical Model flight-sched flight# flight-inst date customer customer#
dept-airp airport-code
arriv-airp airport-code
customer name
parent-child relationship types (1:n only!!): (flight-sched,flight-inst), (flight-inst,customer) one record type is root, all or record types is a child of one parent record type only substantial duplication of customer instances
Relational Model
Commercial systems include: ORACLE, DB2, SYBASE, INFORMIX, INGRES, SQL Server
Dominates database market on all platforms
Relational Model Data Structures
domains attributes relations
relation name attribute names flight-schedule flight#: airline:
weekday:
price:
integer
char(2)
dec(6,2)
domain names
char(20)
Relational Model Integrity Constraints
Primary Keys Entity Integrity Referential Integrity
flight-schedule
customer
flight# p
customer# p reservation flight#
date
customer#
customer name
Relational Model Operations
Powerful query languages
Procedural; describes how to compute a query; operators like JOIN, SELECT, PROJECT
Declarative; describes desired result, e.g. SQL, QBE
insert, delete, & update capabilities
Object-Oriented Model based on object-oriented paradigm, e.g., Simula, Smalltalk, C++, Java area is in a state of flux object-oriented commercial systems include GemStone, Ontos, Orion-2, Statice, Versant, O2 object-relational model has relational repository model; adds object-oriented features; object-relational commercial systems include Starburst, POSTGRES
Clinical Database
Data from Clinical Trial stored in some computer system – Manual – Electronic
Complex Data Cleaning, Review & Reporting Excel, Access, SAS, Oracle
Clinical Db Design Always follows Protocol Creation Accuracy Clarity Ease & Speed of Data Entry Creation of Analysis Datasets Source Data Transfer formats DBMS Requirements
Normalization
Step-by-step decomposition of complex records into simple records Reduces redundancy Non-loss decomposition – To avoid inconsistencies – Update anomalies
Clinical Db Normalization
Tall Skinny Vs Short Fat Normalized Vs Non-normalized Fewer Columns & More Rows Efficient Access & Storage Reducing the Size of Data Groupings or Records Does not apply to CRF Image Db
Short Fat
Patient Visit ID
BP_DI A_1
BP_SY S_1
BP_DI A_2
BP_SY S_2
BP_DI A_3
BP_SY S_3
1001
120
72
118
70
117
68
2
Tall Skinny Patien Visi Measureme t ID t nt
BP_DI A
BP_SY S
1001
2
1
120
72
1001
2
2
118
70
1001
2
3
117
68
Short Fat Vs Tall Skinny
Short Fat – Data Cleaning Checks for variables within a single visit is easy – Missing Values easily detected
Tall Skinny – Easy Creation of Structures & Associated Checks – Data Querying is easier
Thank You