Neutrino: Improved Scalability of Distributed System Objective: • • •
Enabling mobile database access with good performance over low‐bandwidth networks. Maintaining consistency in mobile replicas. Employing partial template based replication in the mobile client.
Motivation: The World Wide Web, not to mention has taken significant importance in almost all fields ranging from docile to enterprise services and has also become their primary communication medium. Relational databases lie at the core for many such enterprises. This drives the need for hosting architectures that support arbitrary levels of load with acceptable performance. A common technique used to improve the performance of a database is replication. Preserving consistency, along with replication, with acceptable performance under conditions of weak connectivity is a difficult challenge. A trade‐off always exists fundamentally between consistency, performance and poor network quality, leading to various architectures. Some of these might relax on consistency; however, failing to maintain the same might undermine the very purpose of using databases for enterprise applications. Absence of such a replication method and inconsistency challenges faced with mobile replicas motivated towards this project.
Abstract: Data replication brings with it an important issue of consistency. Lot of effort has been put into developing different consistency models over the past decade. Though, strong consistency models seem like an obvious choice for all replica consistency management, there is an argument that the weak consistency models are pretty important and definitely have an edge over strong consistency models in mobile and WAN environments. However, today's replication systems are not “mobile‐ready”. Instead of improving the mobile user's environment, the replication system actually hinders mobility and complicates mobile operation. Designed for stationary environments, the replication services do not and cannot provide mobile users with the capabilities they require. Replication in mobile environments requires fundamentally different solutions than those previously proposed, because mobility presents a fundamentally new and different computing paradigm. Even with replication, replicating a whole database in all the replicas creates many problems. First, a huge storage or memory cannot be expected in a mobile device. Second, whenever the replicas have to update it self with master copy storing the whole database in clients will create a bandwidth bottleneck in the server. To avoid such problems the mobile client shall partially replicate the whole database. Generic database replication algorithms do not scale linearly in throughput as all update, deletion and insertion (UDI) queries must be applied to every database replica. The throughput is therefore limited to the point where the number of UDI queries alone is sufficient to overload one server. In such scenarios, partial replication of a database can help, as UDI queries are executed only by a subset of all servers. This system employs partial replication to improve database throughput. This exploits the fact that a Web application’s query workloads is composed of a small set of read and write templates. Using knowledge of these templates and their respective execution costs, this provides database table placements that produce significant improvements in database throughput.
Replication: Replication is one of the oldest and most important topics in the overall area of distributed systems. Whether one replicates data or computation, the objective is to have some group of processes that handle incoming events. If we replicate data, these processes are passive and operate only to maintain the stored data,
1
reply to read requests, and apply updates. When we replicate computation, the usual goal is to provide fault‐ tolerance. For example, a replicated service might be used to control a telephone switch, with the objective of ensuring that even if the primary controller fails, the backup can take over its functions. But the underlying needs are the same in both cases: by ensuring that the replicas see the same events in equivalent orders, they stay in consistent states and hence any replica can respond to queries. Database replication can be used on many database management systems, usually with a master/slave relationship between the original and the copies. The master logs the updates, which then ripple through to the slaves. The slave outputs a message stating that it has received the update successfully, thus allowing the sending (and potentially re‐sending until successfully applied) of subsequent updates. When these replicas are placed in mobile clients then there comes a problem of consistency. Any mobile device is known to get outside the coverage area. In a wireless network the clients often suffer from poor signal strength. In such situations the client cannot be expected to update its copy of replication from the master copy.
Partial Replication: By replicating the whole database there is a reduced complexity in design of the distributed system. But the size of the data present in the mobile device is high. In laptops and PDAs huge storage cannot be expected. To avoid such storage problems partial replication scheme can be employed. The partial replication is possible by partitioning the database logically. The partitioning can be done by two different schemes. The schemes are: 1) Horizontal fragmentation 2) Vertical fragmentation Sample Student Data: Roll No. 1234 1235 1534 2342
Name Xyz Yyy Abc Xxx
Mark1 98 89 78 67
Mark2 78 78 67 56
Total 176 167 148 123
Name Xyz Yyy
Mark1 98 89
Mark2 78 78
Total 176 167
Name Abc Xxx
Mark1 78 67
Mark2 67 56
Total 148 123
Horizontal Fragmentation: Replication1: Roll No. 1234 1235 Replication2: Roll No. 1534 2342
2
Vertical Fragmentation: Roll No. 1234 1235 1534 2342
Name Xyz Yyy Abc Xxx
Roll No. 1234 1235 1534 2342
Total 176 167 148 123
Mark1 98 89 78 67
Mark2 78 78 67 56
Template based replication: This is a database replication system that exploits the fact that the database queries issued by typical Web applications belong to a relatively small number of query templates. A query template is a parameterized SQL query those parameter values are passed to the system at runtime. Prior knowledge of these templates allows one to select database table placements such that each query template can be treated locally by at least one server. We demonstrate that careful table placements based on the data span and the relative execution costs of different templates can provide major scalability gains in terms of sustained throughput. We further show that this technique can easily be used in combination with any existing template‐based database query caching system, Thereby obtaining reduced access latency and yet some more throughput scalability.
Poor consistency with mobile clients: Weak consistency models are based on the fact that replicas can tolerate inconsistencies for at least some time period. The replicas perform the process of reconciliation to make their copies consistent. Conflicts are resolved as and when they arise. User is allowed to perform optimistic updates on the data. Conflict resolution is however application dependent and some work are needed on part of the application developers to handle conflicts. Such models scale very well as background update propagation demands less coordination among sites. The internet has played a big role in the development and popularity of distributed applications. Some of the most popular application like Napster, Gnutella, Freenet have created a tremendous interest in the user community. Deploying these distributed/peer‐to‐peer applications over the internet has posed several challenges. With the number of users being so large, distributed systems deployed today across the internet need to exhibit several important properties like – high availability, scalability across millions of users, ability to tolerate network failures, network partitioning, higher rates of packet loss and varied bandwidth connections to the internet. Despite all the harsh conditions, the distributed application should be able to perform well enough to provide satisfactory user response times. The number of mobile devices in use is exploding with advancement in mobile computing. Mobile users are becoming more and more interconnected. Since mobile users want to increasingly share data and demand access to collaborative applications, the weak consistency model has emerged as a natural way to allow mobile users to share data among themselves. Since mobile client have limited processing power and memory, and they often are in a disconnected state, maintaining strong consistency for mobile replicas is not a feasible option. Hence, many weak consistency solutions for mobile applications have been suggested.
Scalability in Large Scale Systems: Weakly consistent replication protocols normally disseminate information through methods like anti‐ entropy or gossip. These are examples of epidemic protocols where the updates are propagated lazily. These protocols make probabilistic guarantees that there will be eventual consistency amongst all the replicas. This kind of approach has been around for quite a while and has the advantage that in a large‐scale loosely coupled environment it scales better and performs better than other approaches. Lazy replication is used in peer‐to‐peer systems like Gnutella. The highly scalable storage system Oceanstore designed to scale across billions of users provides API supporting weak consistency for increased availability and performance.
3
Wireless Wide Area Networks: Millions of users today own personal computing devices that can access, query, and update data stores and databases over wireless networks. The increasing availability of wide‐area connectivity options such as cellular GPRS/EDGE, EVDO, and WiMax has encouraged the notion of access to data anywhere and anytime. However, many of these wireless technologies exhibit high variability in peak theoretical throughput, as shown in Table 1. End‐to‐end latency is also highly variable. To make matters worse, recent studies have shown that achievable throughput is often only 25–65% of the theoretical maximum, and that large drops in throughput are seen during peak usage periods. Wireless Wide‐Area Network (WWAN) technologies remain important for the foreseeable future in spite of the growing popularity of WiFi (802.11) technology. First, WiFi coverage is limited to compact areas where dense base station infrastructure can be economically sustained. In contrast, WWAN technologies require much less dense infrastructure that can often be piggybacked on existing cell phone infrastructure. Hence, they are economically sustainable over much larger geographic areas. Second, organizational and business considerations may preclude use of WiFi. For example, a user may not subscribe to the wireless service provider of a hotspot. As another example, corporate security guidelines may prohibit a salesman from using the WiFi coverage available at a customer site; hence unplanned use of WiFi might not be possible. Third, existing WiFi infrastructure may be damaged or unusable in situations such as disaster recovery and military operations. Rapid setup of new wireless infrastructure is feasible only if it is sparse. This typically implies reliance on WWAN technologies.
Content Addressable Storage (CAS): Neutrino improves performance by discovering commonality across tentative and authoritative database results. Since these results can be large, eliding commonality can lead to a win on slow networks. Based on its success in networking, distributed file systems and enterprise‐scale storage systems we use Content Addressable Storage (CAS) induced by cryptographic hashing to discover commonality. We have customized this technique for improved performance in our context. Like previous CAS‐based efforts, we assume that real‐world data is collision‐resistant with respect the cryptographic hash function being used. In other words, it is computationally intractable to find two inputs that hash to the same output. Trusting in collision resistance, CAS‐based systems treat the hash of a data item as its unique identifier or tag. Data then becomes content‐ addressable, with tags serving as code words for the much larger data items in network transmission. Although concerns about the collision‐resistance assumption of CAS have been expressed is compelling. If Neutrino’s hash function is broken, replacing it would be simple since Neutrino only uses hashing on volatile data and never on permanent storage. While a much stronger function such as SHA‐256 would increase computational effort at the client and server, the new hashes would still be much smaller than the data items they represent.
Database Access API: Remote database access is widely supported today through Java Database Connectivity (JDBC) and its antecedent, Open Database Connectivity (ODBC). JDBC defines a Java API that enables vendor independent access to databases. In other words, an application written to that API can be confident of working with any database that supports JDBC. Each vendor provides a client component called a JDBC driver that is typically implemented as a dynamically linked library layered on top of the TCP socket interface. The wire protocol between the JDBC driver and its database is vendor‐specific, and typically embodies proprietary optimizations for efficiency.
4
Basic Architecture of Neutrino: Client Requests Edge Server Edge Server Edge Server Queue Queue Queue Query Router Application Application Application Server Proxy Neutrino JDBC Driver Neutrino JDBC Driver Neutrino JDBC Driver JDBC Driver Network Client Proxy Client Proxy Client Proxy Cache JDBC Driver JDBC Driver JDBC Driver Origin Table 1, 2 Table 3, 2 Table 3, 1 Replica Replica Replica Master Copy Server 3 Server 3 Server 3 Architecture Explained: Master Copy & Server Proxy: The master copy of the database is present somewhere in the network. This copy of the database is connected to server proxy through a JDBC driver. The master database is usually a sophisticated database management system like oracle. The server proxy maintains the list of clients connected to the network. Whenever a client contacts the server it has to contact the server proxy. This also maintains transparency by hiding the other replicas from one replica. The server proxy also maintains the list of updations. Server proxy may be present in the master itself or somewhere in the network.
Edge Servers: These are endpoints for this system. Whenever a client has to get data from this content distribution network they contact the edge servers. The edge servers in turn query the replicas for data. The requests to the replicas are queued. The client contacting the edge servers are unaware of the technology that is being used to distribute the data.
5
Query Router: Query router is a major module of Neutrino which helps in achieving partial replication. The query router can be thought of something very similar to a directory service which has the information about the placement of replicas and their load. These routers then find a feasible replica which can meet the client request and serves the content to the client.
Neutrino JDBC Driver: This module is responsible for contacting the server proxy and checks for the signal strength regularly. Whenever there is connectivity to the network the server proxy is contacted. The application program always contacts the JDBC driver.
Design Challenges with Distributed Systems: Heterogeneity: In a distributed system there may systems with various architecture and database servers. Hence there must a common communication medium between these systems. To defeat this challenge a common database access API such as Java DataBase Connectivity (JDBC) or Open DataBase Connectivity (ODBC) can be used. Various database service providers have theirs own drivers which can be plugged into these APIs. Heterogeneity of architecture need not be worried, the communication is between the JDBC and user defined drivers alone.
Consistency: The consistency of the data in various replicas is essential in case of any distributed systems. This is to ensure that the content delivered to the client is the present one. With Neutrino this is the greatest challenge since it is for mobile clients. The mobile devices are often known to get out of coverage area. Even the weather constitutes the loss of signal. Neutrino manages this problem by the help of the Server Proxy and the Neutrino’s native driver. The server proxy has the list of clients that are currently connected to the server and it also maintains the list of updations that have taken place in a database.
Availability: This distributed system is highly available due to the presence of number of replications for the same database. There is also an assumption that at least one replica is under the coverage area always.
Scalability: For improved scalability partial replication schemes are used. The query router routes the client to the appropriate replica thereby the other replicas are not crowded for the data. By having more than one query router the bottleneck with single query router is also avoided. The hardware requirements of the replica is also reduced due to this mechanism because, only partial database is stored in a client. The throughput is high since if a replica is crowded the request is redirected to another replica with same data.
Fault Tolerance: Toleration toward failures is more since the number of replicas are more. Whenever the master fails, the master is updated with the content in replicas that is very recent. The failure is also notified to the clients so that they can request the client for verification of data when the data is not recent.
Transparency: The client is aware of the server proxy alone. They do not know where the replicas are present and how the data are updated. The server proxy takes care of all these activities
6
Output Metrics: The project is planned to be evaluated using:
• • • • • • •
The bandwidth usage. Throughput Server load Bottleneck due to Neutrino driver. Ability of server proxy. Bottleneck due to consistency enforcement. Increase in server load for one client.
Benchmark: The project is evaluated in two different environments like RUBBoS and TPC‐W.
References: | GlobeTP : A template based replication service. Dr. Swaminathan Sivasubramanian, Tobias Groothuyse, Guillaume Pierre Vrije Universitiet, The Netherlands. | GlobeDB : Autonomic Data Replication for Web Applications. Dr. Swaminathan Sivasubramanian et al. | Finger Printing Through Random Polynomials. Micheal O. Rabin, Dept. of Mathematics, The Hebrew University of Jerusalem. | Opportunistic Use of Content Addressable Storage for Distributed File Systems. Niraj Tolia et al. Intel Research Pittsburg. | Replication for web hosting systems. ACM Computing Surveys. S. Sivasubramanian, M. Szymaniak, G. Pierre, and M. van Steen. A case for dynamic selection of replication and caching strategies. In Proceedings of the Eighth International Workshop Web Content Caching and Distribution S. Sivasubramanian, G. Pierre, and M. Van Steen | Akamai EdgeSuite. http://www.akamai.com/en/html/services/edgesuite.html. | DBProxy: A dynamic data cache for Web applications. In Proc. Intl. Conf. on Data Engineering, K. Amiri, S. Park, R. Tewari, and S. Padmanabhan. | Characterizing the scalability of a large web‐based shopping system. ACM Transactions on Internet Technology, M. Arlitt, D. Krishnamurthy, and J. Rolia. | Adaptive database caching with DBCache. Data Engineering, C. Bornh¨ovd, M. Altinel, C. Mohan, H. Pirahesh, and B. Reinwald. | Towards robust distributed systems. Proc. ACM Symp. on Principles of Distributed Computing, E. A. Brewer.
7