Distributed Databases
Aakanksha Kshatriya MBA – (IT) Div: A 060341009
What is a Distributed Database ?
A distributed database is a single logical database that is spread physically across computers in multiple locations that are connected by a data communications network. The network must allow the users to share the data i.e. user at location A must be able to access the data at location B.
Objective
A major objective of distributed databases is to provide ease of access to the data for users at many different locations. To meet this objective, the database must provide what is called location transparency. Location transparency is a design goal for a database, which says that a user requesting data need not know the location of the data.
Advantages of distributed databases
Increased reliability and availability Local control Modular growth Lower communication costs Faster response
Disadvantages of distributed databases
Software cost and complexity Processing overhead Data integrity Slow response
Options for distributing the database There are four basic strategies for distributing databases:
Data replication Horizontal partitioning Vertical partitioning Combinations of the above
Data Replication
One option for data distribution is to store a separate copy of the database at each of two or more sites. In other words, we replicate the data at all the sites. If data is stored at every site, then we have the case of full replication.
Pros and Cons
Pros: Reliability Fast response Cons: Storage requirements Complexity and cost of updating
Horizontal Partitioning
Horizontal Partitioning means distributing the rows of a table into several separate tables. Here some of the rows of a table are put into a base relation at one site and other rows are put into other base relations at other sites. Generally, the distribution is done based on geography.
Pros and Cons
Pros: Efficiency Local optimization Security Cons: Inconsistent access speed Backup vulnerability
Vertical Partitioning
Vertical Partitioning means distributing the columns of a table into several separate tables. With this approach, some of the columns are projected into a base relation at one of the sites and other columns at several other sites.
Pros and Cons
The advantages and disadvantages of vertical partitions are identical to those for horizontal partitions. However, horizontal partitions support an organizational design in which functions are replicated on a regional basis while vertical partitions are applied across organizational functions with reasonably separate data requirements.
Combinations of the above
There are almost unlimited combinations of the three preceding strategies. Some data may be stored centrally, while other data may be replicated at various sites. Also, for a given relation, both horizontal and vertical partitions may be desirable.
Conclusion
The overriding principle in distributed database design is that data should be stored at the sites where they will be accessed most frequently. The database administrator plays a critical and central role in organizing a distributed database in order to make it distributed and not decentralized.