2
Introduction to Database Management Systems (DBMS)
Database Management System (DBMS) Definitions:
Data: Known facts that can be recorded and that have implicit meaning Database: Collection of related data
Ex. the names, telephone numbers and addresses of all the people you know
Database Management System:
4
DBMS (Contd.)
Goals of a Database Management System:
To provide an efficient as well as convenient environment for accessing data in a database
Enforce information security: database security, concurrence control, crash recovery
It is a general purpose facility 5
Benefits of database approach
Redundancy can be reduced Inconsistency can be avoided Data can be shared Standards can be enforced Security restrictions can be applied Integrity can be maintained Data independence can be provided
6
DBMS Functions
Data Definition Data Manipulation Data Security and Integrity Data Recovery and Concurrency Data Dictionary Performance
7
Database System Users
DATABASE
Application Programs/Queries
SYSTEM DBMS Software
Software to process queries/programs Software to access stored data
Stored Data Defn. (META-DATA).
Stored Database
8
Database System Application program query Q2
Compiled query Q2
user query Q1
Database scheme
Query processor
DDL compiler
Database manager
Database description
File manager
Physical database
9
Data Model
A set of concepts used to describe the structure of a database By structure, we mean the data types, relationships, and constraints that should holds for theofdata Categories Data Models
Conceptua l
Physical
Representation al 10
Database Architecture External level (individual user views) Conceptual level (community user view)
Internal level (storage view) Database
11
An example of the three levels SNo FName LName
Age
Salary
Conceptual View SNo FName LName
Age
External View1
SNo LName BranchNo External View2
Salary
BranchNo struct STAFF { Internal int staffNo; View int branchNo; char fName[15]; char lName[15]; struct date dateOfBirth; float salary; struct STAFF *next; /* pointer to next Staff record */ }; index staffNo; index branchNo; /* define indexes for staff */
12
Schema
Schema: Description of data in terms of a data model Three-level DB Architecture defines following schemas:
External Schema (or sub-schema)
Conceptual Schema (or schema)
Written using external DDL Written using conceptual DDL
Internal Schema
Written using internal DDL or storage structure definition
13
Data Independence
Change the schema at one level of a database system without a need to change the schema at the next higher level
Logical data independence: Refers to the immunity of the external schemas to changes in the conceptual schema e.g., add new record or field
Physical data independence: Refers to the immunity of the conceptual schema
14
TYPES OF DATABASE MODELS HIERARCHICAL
NETWORK
COLUMN
ROW
VALUE
TABLE
RELATIONAL 15
DATABASE DESIGN PHASES DATA ANALYSIS
Entities - Attributes - Relationships - Integrity Rules
LOGICAL DESIGN Tables - Columns - Primary Keys - Foreign Keys
PHYSICAL DESIGN DDL for Tablespaces, Tables, Indexes
16
Introduction to Relational Databases: RDBMS
Definition : RDBMS
It is a system in which, at a minimum :
The data is perceived by the user as tables ( and nothing but tables ); and
The operators at the user’s disposal e.g., for data retrieval - are operators that generate new tables from old, and those include at least SELECT, PROJECT,18
Features of an RDBMS
The ability to create multiple relations (tables) and enter data into them An interactive query language Retrieval of information stored in more than one table Provides a Catalog or Dictionary, which itself consists of tables ( called system tables ) 19
Some Important Terms
Relation : a table
Tuple : a row in a table
Attribute : a Column in a table
Degree : number of attributes
Cardinality : number of tuples
Primary Key : a unique identifier for the table
Domain : a pool of values from which
20
Properties of Relations (Tables)
There are no duplicate rows (tuples)
Tuples are unordered, top to bottom
Attributes are unordered, left to right
All attribute values are atomic ( or scalar )
Relational databases do not allow repeating groups
21
Keys
Key
Super Key
Candidate Keys
Primary Key
Alternate Key
Secondary Keys 22
Keys and Referential Integrity Enrolled sid 53666 53688 53650 53666
cid grade carnatic101 C reggae203 B topology112 A history105 B
Foreign key referring to sid of STUDENT relation
Student sid
name
login
age
gpa
53666 Jones
Jones@cs
18
3.4
53688 Smith
Smith@eecs
18
3.2
53650 Smith
Smith@math
19
3.8
Primary key
23
24
Relational Algebra
Relational Query Languages
Query languages: Allow manipulation and retrieval of data from a database. Relational model supports simple, powerful QLs: Strong formal foundation based on logic. Allows for much optimization. Query Languages != programming languages!
26
Example Instances R1
sid
bid
day
22
101
10/10/99
58
103
11/12/99
S1
sid
sname
rating
age
22
Deepa
7
45.0
31
Laxmi
8
55.5
58
Roopa
10
35.0
S2
sid
sname
rating
age
28
Yamuna
9
35.0
31
Laxmi
8
55.5
44
Geeta
5
35.0
58
Roopa
10
35.0 27
Relational Algebra
Basic operations:
Selection (σ ) Projection (π) Cross- product ( Χ) Set- difference ( –) Union (∪ )
28
Projection sname
rating
Yamuna
9
Laxmi
8
Geeta
5
Roopa
10
π sname, rating(S2)
age 35.0 55.5
π age(S2) 29
Selection sid
sname
rating
age
28
Yamuna
9
35.0
58
Roopa
10
35.0
sname
rating
Yamuna
9
Roopa
10
σ rating > 8(S2)
π sname, rating(S2) (σ rating > 8(S2))
30
Union, Intersection, Set Difference sid
sname
rating
age
22
Deepa
7
45.0
31
Laxmi
8
55.5
58
Roopa
10
35.0
44
Geeta
5
35.0
28
Yamuna
9
35.0
sid
sname
rating
age
31
Laxmi
8
55.5
58
Roopa
10
35.0
sid
sname
rating
age
22
Deepa
7
45.0
S1 ∪ S2
S1 ∩ S2 S1 − S2 31
Cross- Product (sid)
sname
rating
age
(sid)
bid
day
22
Deepa
7
45.0
22
101
10/10/99
22
Deepa
7
45.0
58
103
11/12/99
31
Laxmi
8
55.5
22
101
10/10/99
31
Laxmi
8
55.5
58
103
11/12/99
58
Roopa
10
35.0
22
101
10/10/99
58
Roopa
10
35.0
58
103
11/12/99
32
Joins Condition Join : (sid)
sname
rating
age
(sid)
bid
day
22
Deepa
7
45.0
22
101
10/10/99
31
Laxmi
8
55.5
58
103
11/12/99
33
Equi-Join
(sid)
sname
rating
age
bid
day
22
Deepa
7
45.0
101
10/10/99
58
Roopa
10
35.0
103
11/12/99
34
Division •Not supported as a primitive operator, but useful for expressing queries like: •Find sailors who have reserved all boats . A
sno s1 s1 s1 s1 s2 s2 s3 s4 s4
pno p1 p2 p3 p4 p1 p2 p2 p2 p4
pno p2 B1 sno s1 s2 s3 s4 A/B1
pno p2 p4
pno
B2
p4
sno
p1 p2
B3
s1
sno
s4
s1
A/B2
A/B3 35
36
Introduction to Query Optimization
Processing A Highlevel Query Query in a high level language SCANING, PARSING AND VALIDATING Intermediate form of query QUERY OPTIMIZER Execution plan QUERY CODE GENERATOR
Typical steps when processing a high level query.
Code to execute the query RUNTIME DATABASE PROCESSOR
Result of query
38
Two Main Techniques for Query Optimization Heuristic Rules: A heuristic is a rule that works well in most of cases, but not always. General Idea: Many different relational algebra expressions (and thus query trees) are equivalent. Transform the initial query tree of a query into an equivalent final query tree that is efficient to execute.
Cost based query optimization 39
Motivating Example select * from R1, R2, R3 where R1.r2no=R2.r2no and R2.r3no=R3.r3no and R1.a=5000
SS(R2)
NLJ
NLJ
SS(R1, “a=5000”)
SS(R3)
40
Alternative Plans 1(No Indexes) select * from R1, R2, R3 where R1.r2no=R2.r2no and R2.r3no=R3.r3no and R1.a=5000
SS(R1, “a=5000”)
NLJ
NLJ
SS(R3)
SS(R2)
41
Alternative Plans 2 (With Indexes) select * from R1, R2, R3 where R1.r2no=R2.r2no and R2.r3no=R3.r3no and R1.a=5000
IS(R1, “a=5000”)
NLJ
NLJ
SS(R3)
SS(R2)
42
43
Conceptual Design Using the EntityRelationship Model
Overview of Database Design
Conceptual design : (ER Model is used at this stage.)
Schema Refinement : (Normalization)
Physical Database Design and Tuning
45
E R Modeling
Conceptual Schema Design Relational Calculus - Formal Language for Relational D/B. Relational Calculus
Predicate Calculus
Domain Calculus
SQL / Tuple Based
Query By Examples 46
Design Phases… Requirements Collection & Analysis Data Requirements
Functional Requirements User Defined Operations Data Flow Diagrams Sequence Diagrams, Scenarios
Conceptual Design Entity Types, Constraints , Relationships No Implementation Details.
Logical Design
Ensures Requirements Meets the Design
Data Model Mapping – Type of Database is identified Physical Design Internal Storage Structures / Access Path / File Organizations 47
E-R Modeling
Entity
Entity Set
a group of similar entities
Attribute
is anything that exists and is distinguishable
properties that describe an entity
Relationship
an association between entities
48
Notations ENTITY TYPE ( REGULAR )
WEAK ENTITY TYPE RELATIONSHIP TYPE
WEAK RELATIONSHIP TYPE
49
Entity Attributes ssn
name
Employee
lot
SSN NAME 123- 22- 3666Attishoo 231- 31- 5368Smiley 131- 24- 3650Smethurst
LOT 48 22 35
Entity Set CREATE TABLE Employees (ssn CHAR (11), name CHAR (20), lot INTEGER, PRIMARY KEY (ssn)) 50
Types of Relationships 1
1:1
student
1:M
students
M
M:M
students
M
Is issued
enrols in
take
1
ID card
1
course
M
tests
51
ER Model ssn
name
Employee supervisor
lot
since
Works_in
did
dname
budget
Department
Subordinate
Reports_To
52
ER Model (Contd.) Works_ In SSN 123-22-3666 123-22-3666 231-31-5368
DID 51 56 51
SINCE 1/1/91 3/3/93 2/2/92
CREATE TABLE Works_ In( ssn CHAR (11), did INTEGER, since DATE, PRIMARY KEY (ssn, did), FOREIGN KEY (ssn) REFERENCES Employees, FOREIGN KEY (did) REFERENCES Departments) 53
Key Constraints
ssn
name
Employee
lot
since
Manages
did
dname
budget
Department
54
Key Constraints for Ternary Relationships
ssn
lot
name
Employee
since
Works_in
did
dname budget Department
Location address
capacity
55
Participation Constraints ssn
name
Employee
lot
since
Manages
did
dname
budget
Department
Works_in since 56
Weak Entities ssn
name
Employee
lot
cost
policy
pname
age
Dependent
57
ISA (‘is a’) Hierarchies ssn
name
lot
Employee Hrly_wages Hrs_worked
Hourly_Emp
IsA
contractid Contract_Emp 58
Aggregation ssn
name
lot
Employee monitors
pid
pbudget
project
Started on
sponsors
until
did
dname budget
department 59
Entity vs. Attribute Works_ In does not allow an employee to work in a department for two or more periods (why?)
ssn
name
Employee
lot
from
to
Works_in
did
dname
budget
Department
60
Entity vs. Attribute (Contd.)
ssn
lot
name
Employee
from
did
Works_in
Duration
dname
budget
Department
to
61
Entity vs. Relationship
ssn
name
lot
Employee
since
DB
manages
did
dname
budget
Department
DB - Dbudget
62
Entity vs. Relationship ssn
name
Employee
lot
did
manages
dname
budget
Department
since Appt num
Mgr_appt DBudget 63
Binary vs. Ternary Relationships
ssn
lot
name
Employee
pname
age
Dependent
covers Policy
policyid
cost 64
Binary vs. Ternary Relationships Better Design ssn
name
lot
pname
Dependent
Employee
Beneficiary
purchaser
policyid
age
Policy
cost 65
Constraints Beyond the ER Model • Some constraints cannot be captured in ER diagrams: • Functional dependencies • Inclusion dependencies • General constraints
66
E-R Diagram DEPARTMENT 1 SUPPLIER DEPT_ EMP M
M M
PROJ_ WORK
M PROJECT
EMPLOYEE 1
M
M
1
PROJ_ MGR
M
DEPENDENT
SUPP_ PART
M
EMP_ DEP M
SUPP_ PART_ PROJ
PART M
M M
PART_ STRUC TURE
67
Example to Start with ….
An Example Database Application called COMPANY which serves to illustrate the ER Model concepts and their schema design. The following are collection from the Client.
68
Analysis…
Company : Organized into Departments, Each Department has a name, no and manager who manages the department. The Company keeps track of the date that employee managing the department. A Department may have a Several locations. 69
Analysis…
Department : A Department controls a number of Projects each of which has a unique name , no and a single Location. Employee : Name, Age, Gender, BirthDate, SSN, Address, Salary. An Employee is assigned to one department, may work on several projects which are not controlled by the department. Track of the number of hours per week is also controlled. 70
Analysis….
Keep track of the dependents of each employee for insurance policies : We keep each dependant first name, gender, Date of birth and relationship to the employee.
71
Now to our Company… DEPARTMENT ( Name , Number , { Locations } , Manager, Start Date ) PROJECT ( Name, Number, Location , Controlling Department ) EMPLOYEE (Name (Fname, Lname) , SSN , Gender, Address, Salary Birthdate, Department , Supervisor , (Workson ( Project , Hrs)) DEPENDENT ( Employee, Name, Gender, Birthdate , Relationship )
72
Example …
Manage:
Department and Employee Partial Participation Relation Attribute : StartDate.
Works For:
Department and Employee Total Participation
73
Example…
Control : Department , Project Partial Participation from Department Total Participation from Project Control Department is a RKA.
Supervisor : Employee, Employee Partial and Recursive
74
Example …
Works – On : Project , Employee Total Participation Hours Worked is a RKA.
Dependants of: Employee , Dependant Dependant is a Weaker Dependant is Total , Employee is Partial.
75
One Possible mapping of the Problem Statement Name No
Lname Fname
Work s For
Sal Sex
Loc
Department
SSN
Name Employee
Sdate
Address
Control s
manage s
Bdate Hours
Project
Work sOn
Supe rvise s
Name
Loc
No
Depend On Dependent Name
Sex
Bdate
Relationship 76
77
78
79
80
Schema Refinement and Normalization
Normalization and Normal Forms
Normalization:
Decomposing a larger, complex table into several smaller, simpler ones. Move from a lower normal form to a higher Normal form.
Normal Forms: First Normal Form (1NF) Second Normal Form (2NF) Third Normal Form (3NF) *Higher Normal Forms (BCNF, 4NF, 5NF ....)
In practice, 3NF is often good enough. 82
Why Normal Forms
The first question to ask is whether any refinement is needed!
If a relation is in a certain normal form (BCNF, 3NF etc.), it is known that certain kinds of problems are avoided/ minimized. This can be used to help us decide whether decomposing the relation will help. 83
The Evils of Redundancy
Redundancy is at the root of several problems associated with relational schemas More seriously, data redundancy causes several anomalies: insert, update, delete Wastage of storage. Main refinement technique: decomposition (replacing ABCD with, say, AB and BCD, or ACD and ABD).
84
Refining an ER Diagram Before ssn
name
Employee
lot
since
Works_in
did
dname
budget
Department
85
Refining an ER Diagram After ssn
name
since
did
dname
budget lot
Employee
Works_in
Department
86
First Normal Form
A table is in 1NF, if every row contains exactly one value for each attribute.
Disallow multivalued attributes, composite attributes and their combinations.
1NF states that :
domains of attributes must include only atomic (simple, indivisible) values and that value of any attribute in a tuple must be a single value from the domain of that
87
Functional Dependencies (FDs)
Provide a formal mechanism to express constraints between attributes
Given a relation R, attribute Y of R is functionally dependent on the attribute X of R if & only if each Xvalue in R has associated with it precisely one Y-value in R. 88
Full Dependency
Concept of full functional dependency
A FD x → y is a full functional dependency if removal of any attribute A from X means that the dependency does not hold any more.
89
Partial Dependency
An F.D. x → y is a partial dependency if there is some attribute A ∈ X that can be removed from X and the dependency will still hold.
90
Example: Constraints on Entity Set S N 123- 22- 3666 Attishoo 231- 31- 5368 Smiley 131- 24- 3650 Smethurst 434- 26- 3751 Guldu 612- 67- 4134 Madayan
S N 123- 22- 3666 Attishoo 231- 31- 5368 Smiley 131- 24- 3650 Smethurst 434- 26- 3751 Guldu 612- 67- 4134 Madayan
L 48 22 35 35 35
H 40 30 30 32 40
L 48 22 35 35 35
R 8 8 5 5 8
R 8 8 5 5 8
W 10 10 7 7 10
H 40 30 30 32 40
R W 5 7 8 10
91
Second Normal Form (2NF)
A relation schema R is in 2NF if:
it is in 1NF and
every non-prime attribute A in R is fully functionally dependent on the primary key of R.
2NF prohibits partial dependencies.
92
2NF: An Example
Emp{Eno, Dept, ProjCode, Hours}
Primary key: {Eno, ProjCode}
{Eno} -> {Dept}, {Eno, ProjCode} -> {Hours}
Test of 2NF
{Eno} -> {Dept}: partial dependency.
Emp is in 1NF, but not in 2NF.
Decomposition:
Emp {Eno, Dept}
93
Transitive Dependency
An FD X → Y in a relation schema R is a transitive dependency if
there is a set of attributes Z that is not a subset of any key of R, and
both X → Z and Z → Y hold.
94
Third Normal Form
A relation schema R is in 3NF if
It is in 2NF and
No nonprime attribute of R is transitively dependent on the primary key.
3NF means that each non-key attribute value in any tuple is truly dependent on the Primary Key and not even partially on other attributes.
3NF prohibits transitive
95
3NF: An Example
Emp{Eno, Dept, Dept_Head} Primary key: {Eno} {Eno} -> {Dept}, {Dept} -> {Dept_Head}
Test of 3NF {Eno} -> {Dept} -> {Dept_Head}: Transitive dependency. Emp is in 2NF, but not in 3NF.
Decomposition: Emp {Eno, Dept} Dept {Dept, Dept_Head}
96
Boyce –Codd Normal Form
The intention of BCNF is that- 3NF does not satisfactorily handle the case of a relation processing two or more composite or overlapping candidate keys
97
BCNF ( Boyce Codd Normal Form)
A Relation is said to be in Boyce Codd Normal Form (BCNF) if and only if every determinant is a candidate key.
98
Decomposition of a Relation Scheme
Suppose that relation R contains attributes A1 ... An. A decomposition of R consists of replacing R by two or more relations such that: Each new relation scheme contains a subset of the attributes of R (and no attributes that do not appear in R), and Every attribute of R appears as an attribute of one of the new relations.
99
100
101
102
103
104
105
106
Transaction, Concurrency Control and Recovery
Transaction
A sequence of many actions which are considered to be one atomic unit of work.
Governed by four ACID properties:
Read, write, commit, abort Atomicity, Consistency, Isolation, Durability
Has a unique starting point, some actions and one end point 108
The ACID Properties
A tomicity: All actions in the transaction happen, or none happen. C onsistency: If each transaction is consistent, and the DB starts consistent, it ends up consistent. I solation: Execution of one transaction is isolated from that of other transactions. D urability: If a transaction commits, its effects persist. 109
Automicity
All-or-nothing, no partial results. An event either happens and is committed or fails and is rolled back. e.g. in a money transfer, debit one account, credit the other. Either both debiting and crediting operations succeed, or neither of them do. Transaction failure is called Abort Commit and abort are irrevocable actions. There is no undo for these actions. An Abort undoes operations that have already been executed For database operations, restore the data’s previous value from before the transaction (Rollback-it); a Rollback command will undo all actions taken since the last commit for that110
Consistency
Every transaction should maintain DB consistency Referential integrity - e.g. each order references an existing customer number and existing part numbers The books balance (debits = credits, assets = liabilities) Consistency preservation is a property of a transaction, not of the database mechanisms for controlling it (unlike the A, I, and D of ACID) If each transaction maintains consistency,111
Isolation
Intuitively, the effect of a set of transactions should be the same as if they ran independently. Formally, an interleaved execution of transactions is serializable if its effect is equivalent to a serial one. Implies a user view where the system runs each user’s transaction stand-alone. Of course, transactions in fact run with lots of concurrency, to use device parallelism – this will be covered later. Transactions can use common data (shared data) They can use the same data processing 112
Durability
When a transaction commits, its results will survive failures (e.g. of the application, OS, DB system … even of the disk). Makes it possible for a transaction to be a legal contract. Implementation is usually via a log DB system writes all transaction updates to a log file to commit, it adds a record “commit(Ti)” to the log when the commit record is on disk, the transaction is committed. 113
Transaction processing Can be automatic (controlled by the RDBMS) or programmatic (programmed using SQL or other supported programming languages, like PL/SQL)
114
Why Have Concurrent Processes?
Better transaction throughput Improved response time Done via better utilization of resources:
While one processes is doing a disk read, another can be using the CPU or reading another disk.
115
Typical situations requiring concurrency control
Exclusive access to an external device or shared service (e.g., managing printer queues) Coordination of applications which process parallel data (e.g. parallel DB servers) Disabling or enabling execution of the client programs in a specific moment (typically for database administration - e.g. database backups, enforcing resource occupation, etc.)
116
Problems with Concurrency (in absence of locking) Lost Update problem - losing values due to intervention of write operation from other overlapping transactions Temporary Update problem discarding previous changes made by overlapping transaction after rollback Incorrect Summary problem overwriting of certain values used for calculation by write operations from other transactions
117
Lost Update Problem Transaction A Start A
Value
T1
Read Value (6)
6
Start B
T2
Add 2 (6+2=8)
6
Read Value (6)
T3
Write Value (8) End A
8
Add 3 (6+3=9) Write Value (9) End B
Time T0
T4 T5
Transaction B
6
9 9
What should the final Order Value be?
Which Update has been lost? 118
Temporary Update Problem Time
Transaction A
Value
T0
Start A
6
T1
Read Value (6)
6
T2
Add 2 (8)
6
T3
8
T5
Write Value (8) Failure: Rollback! Write Value (6)
T6
End A
11
T4
T5
8 6
11
Transaction B
Start B Read Value (8) Add 3 (8+3=11) Write Value (11) End B
What should the final Order Value be? Where is the temporary update? 119
Incorrect Summary Problem Time T0
Transaction A st Read 1 Value (6)
Values
Transaction B
6 3
T1
Add 2 (6+2=8)
6 3
T2
Write 1st Value (8)
8 3
T3
Read 2nd Value (3)
8 3
Read 1st Value (8)
T4
Add 2 (3+2 = 5)
8 3
Read 2nd Value (3)
T5
Write 2 Value should(5) the total
8 5
Total Sum = 11
nd
What should the total Order Value be? Which order was accumulated before update, and which after? 120
t1
t2
3.1 Database State and Changes
State D1
State D2
T
D1, D2
- Logically consistent states of the database data
TTransaction for changing the database t1, t2 - Absolute time before and after the transaction
121
Progress A transaction reaches its commit point when all operations accessing the database are completed and the result has been recorded in the log. It then writes a [commit, ] and terminates. BEGIN
END active
READ , WRITE
partially committed ROLLBACK
COMMIT committed
ROLLBACK
aborted
terminated
When a system failure occurs, search the log file for entries [start, ] and if there are no logged entries [commit, ] then undo all operations that have logged entries [write, , X, old_value, new_value] 122
Schedules • Schedule: Actions of transactions as seen by the DBMS T1 R(A) W(A)
T2
R(B) W(B) R(C) W(C) 123
Serializable Schedule
A schedule whose effect on the DB “state” is the same as that of some serial schedule
All serial schedules are serializable
But the reverse may not be true
124
Serializability Violations Transfer Rs.10,000 from A to B
T1 R(A) W(A)
Database is inconsistent!
R(B) W(B) commit
Add 6% interest to A&B
T2
R(A) W(A) R(B) W(B) commit
125
Cascading Aborts T1 R(A) W(A)
T2
R(A) W(A) abort
126
Recoverable Schedules Unrecoverable Schedule
T1 R(A) W(A)
T2
R(A) W(A) commit abort
Recoverable Schedule
T1 R(A) W(A)
T2
R(A) W(A) commit commit 127
Locking
The concept of locking data items is one of the main techniques for controlling the concurrent execution of transactions. A lock is a variable associated with a data item in the database. Generally there is a lock for each data item in the database. A lock describes the status of the data item with respect to possible operations that can be applied to that item used for synchronising the access by concurrent transactions to the database items. A transaction locks an object before using it When an object is locked by another transaction,128
Locking Granularity
A database item which can be locked could be a database record a field value of a database record the whole database Trade-offs coarse granularity the larger the data item size, the lower the degree of concurrency fine granularity the smaller the data item size, the more locks to be managed and stored, and the more lock/unlock operations needed. 129
Locking: A Technique for Concurrency Control
•Locks are automatically obtained by DBMS. •Guarantees serializability! Compatibility matrix for lock types X and S
-S X
-√ √ √
S √ √
X √
S: Shared lock X: Exclusive lock -- No lock
130
Two- Phase Locking (2PL) Strict 2PL: – If T wants to read an object, first obtains an S lock. – If T wants to modify an object, first obtains X lock. – Hold all locks until end of transaction. – Guarantees serializability, and recoverable schedule, too! also avoids WW problems! 2PL: – Slight variant of strict 2PL – transactions can release locks before the end (commit or abort) But after releasing any lock it can acquire no new locks
– Guarantees serializability 131
Handling a Lock Request Lock Request (XID, OID, Mode) Mode==X
Mode==S
Empty Wait Queue?
Currently Locked? No
Yes
Yes
Currently X-locked? No
Put on Queue
Yes No
Grant Lock 132
133
Recovery
Occurs in case of transaction failures.
Database (DB) is restored to the most recent consistent state just before the time of failure.
To do this, the DB system needs information about changes applied by various transactions. It is the system log. 134
Recovery: Motivation crash T1 T2 T3 T4 T5 •Atomicity: Undoing actions of transaction that do not commit •Durability: Making sure all actions of committed transactions survive system crashes •The Recovery Manager guarantees Atomicity & Durability. 135
Recovery Outline
Restore to most recent “consistent” state just before time of failure
Use data in the log file
Catastrophic Failure Restore database from backup Replay transactions from log file
Database becomes inconsistent (noncatastrophic errors)
Undo or Redo last transactions until consistent state is restored
136
Logging
Record REDO and UNDO information, for every update, in a log. – Sequential writes to log (put it on a separate disk). – Minimal info (diff) written to log, so multiple updates fit in a single log page.
137
Handling the Buffer Pool • When is buffer written back to disk?
• Steal/No-steal Can it be written before commit? (steal) Or does it have to wait till after commit? (no-steal) • Force/No-force Is it written “immediately” after commit? (force) Or can it remain in memory? (no-force) NoSteal Steal Force Trivial NoForce
Desired 138
Write- Ahead Logging (WAL)
The Write- Ahead Logging Protocol:
Must force the log record for an update before the corresponding data page gets to disk.
Must write all log records for a transaction before commit .
What goes into log:
BFIM needed for UNDO type algorithms
AFIM needed for REDO type algorithms
139
Checkpoints in the System Log
Checkpoint record written in log when all updated DB buffers written out to disk Any committed transaction occurring before checkpoint in log can be considered permanent (won’t have to be redone after crash) Actions suspend execution of all transactions force-write all modified buffers to disk write checkpoint entry in log and force write log resume transactions 140 Fuzzy checkpointing
141
142