Chapter 22 Distributed DBMS Concepts and Design CS 157B Edward Chen 1
Introduction
Distributed Database changes the way of data sharing, conceptually from centralization into decentralization. Development of computer networks promotes a decentralized mode of work. Development of distributed systems should improve the sharing ability of the data and the efficiency of data access Distributed systems should help resolve the "islands of information" problem 2
Concepts Distributed database A logically interrelated collection of shared data and description of this data, physically distribute over a computer network. Distributed DBMS The software system that permits the management of the distributed databases and makes the distribution transparent to users.
3
Concepts (cont’d)
In a distributed DBMS , single logical database is split into a number of fragments. Each fragment is stored on one or more computers under the control of a separate DBMS with the computer connected to a network. Each site is capable of independently processing user requests that require access to local data and is also capable of processing data stored on other computers in the network. 4
Concepts (cont’d)
There are two applications 1) local application: do not require data from other sites 2) global application: do require data from other sites
Distributed DBMS need to have at least one global application. 5
Concepts (cont’d) DBMS have following characteristics:
A collection of logically related shared data The data is split into number of fragments Fragments may be replicated Fragments/replicas are allocated to sites. The sites are linked with computer network. The data at each site is under the control of a DBMS The DBMS at each site can handle local applications autonomously. Each DBMS participates in at least one global application. 6
Distributed Database Management System
It is not necessary for every site in the system to have its own local database as shown The system is expected to make the distribution transparent to the user Distributed database is split into fragments that can be stored on different computers and perhaps replicated The objective of the transparency is to make the distributed system to appear like a centralized system 7
Distributed Processing
The system consists of data that is physically distributed across the network. If the data is centralized, even though the users may be accessing the data over the network, it is not considered as distributed DBMS, simply distributed processing 8
Advantages
Reflects organizational structure Improved shareablility and local autonomy Improved availability Improved reliability Improved performance Modular growth Less danger on single-point failure 9
Disadvantages
Complexity Cost Security Integrity control more difficult Lack of standards Lack of experience Database design more complex Possible slow response 10
Homogeneous and Heterogeneous DDBMSs Homogeneous DDBMS In homogeneous DDBMS, all sites use the same DBMS product. Much easier to design and manage. This design provides incremental growth by making additional new sites to DDBMS easy Allows increased performance by exploiting the parallel processing capability of multiple sites. 11
Homogeneous and Heterogeneous DDBMSs (cont’d) Heterogeneous DDBMSs In heterogeneous DDBMS, all sites may run different DBMS products, which need not to be based on the same underlying data model and so the system may be composed of RDBMS, ORDBMS and OODBMS products.
In heterogeneous system, communication between different DBMS are required for translations.
In order to provide DBMS transparency, users must be able to make requests in the language of the DBMS at their local site.
Data from the other sites may have different hardware, different DBMS products and combination of different hardware and DBMS products.
The task for locating those data and performing any necessary translation are the abilities of heterogeneous DDBMS.
12
Components Architecture of DDBMS Component Architecture for a DDBMS Local DBMS (LDBMS) component - It has its own local system catalog that stores information about the data held at that site. Data communications (DC) component – is the software that enables all sites to communicate with each other. Global System Catalog (GSC) - The GSC holds information specific to the distributed nature of the system, such as the fragmentation and allocation schemas. Distributed DBMS component - is the controlling unit of the entire system.
13
Components Architecture of DBMS (cont’d)
Fig. components of DDBMS 14
FRAGMENTATION Why fragmentation? · Usage:
Applications work with views rather than entire relations · Efficiency: Data is stored close to where it is mostly frequently used · Parallelism: With fragments are the unit of distribution, a transaction can be divided into several subqueries that operate on fragments. · Security : Data not required by local applications is not restored, and consequently not available to unauthorized users. .Performance: The performance of global applications that require data from several fragments located at different sites may be slower. 15
FRAGMENTATION (cont’d) Types of fragmentation · Horizontal fragmentation : a subset of the tuples of a relation, defined as sp(R), where p is a predicate based on one or more attributes of there relation. · Vertical fragmentation : a subset of the attributes of a relation, denoted as Pa1, a2, .., an (R), where a1, a2, .., an are attributes of the relation R. · Mixed fragmentation : A horizontal fragment that is subsequently vertically fragmented, or a vertical fragment that is then horizontally fragmented. 16
FRAGMENTATION (cont’d) Selection S p(R) - defines a relation that contains only those tuples of R that satisfy the specified condition (predicate p). The same as horizontal fragmentation. Projection P a1, a2, .., an (R) - defines a relation that contains a vertical subset of R, extracting the values of specified attributes and eliminating duplicates. The same as vertical fragmentation. 17
Summary
A distributed database is a collection of multiple, logically interrelated collection of shared data which is physically distributed over a computer network.
Basically, a DDBMS is different from a client-server system, even though the client-server architecture can be used to provide distributed DBMSs.
Both Top-down and Bottom-up design approaches can be used to design DDBMS. 18
Summary (cont’d)
A relation may be divided into a number of sub relations called fragments, which may be horizontal, vertical, mixed.
The three correctness rules of fragmentation are: completeness, reconstruction, and disjoitness.
There are four allocation strategies regarding the placement of data: centralized, partitioned, complete replication and selected replication. 19
The End
20