SOFTWARE REQUIREMENTS SPECIFICATION FOR
NEUTRINO PREPARED BY
Balakrishnan L M
Karthik C
Kumaran V
Senthil Kumar V
Batch - II
Page i
Software Requirements Specification for Neutrino
Page 1
1. Introduction 1.1 Purpose • • • • •
Enabling mobile database access with good performance over low bandwidth networks. Maintaining consistency in mobile replicas with stale network. Employing partial template based replication in the mobile client. Deployment of a dynamically replicated database. Utilizing the processing power of low power clients.
1.2 Document Conventions • •
The italicized words represent the terminologies used in this project. The bold words represent the novelties that are involved in this project.
1.3 Intended Audience and Reading Suggestions This document is intended for the following types of readers: • Developers • Project Managers • Testers • Documentation Writers • Users It is highly recommended that the readers start reading from the Introduction to get a better idea about Neutrino.
1.4 Project Scope A distributed system in a wireless network primarily suffers from the problem of inconsistency and poor performance. A trade off always exists between the consistency enforced and the performance of the distributed system. This project focuses mainly on developing a distributed system, in a stale network, without degrading the consistency and also scales better performance through partial replication.
1.5 References | GlobeTP : A template based replication service. Dr. Swaminathan Sivasubramanian, Tobias Groothuyse, Guillaume Pierre Vrije Universitiet, The Netherlands. | GlobeDB : Autonomic Data Replication for Web Applications. Dr. Swaminathan Sivasubramanian et al. | Finger Printing Through Random Polynomials. Micheal O. Rabin, Dept. of Mathematics, The Hebrew University of Jerusalem. | Opportunistic Use of Content Addressable Storage for Distributed File Systems. Niraj Tolia et al. Intel Research Pittsburg. | Replication for web hosting systems. ACM Computing Surveys. S. Sivasubramanian, M. Szymaniak, G. Pierre, and M. van Steen.
Page 1
Software Requirements Specification for Neutrino
Page 2
| A case for dynamic selection of replication and caching strategies. In Proceedings of the Eighth International Workshop Web Content Caching and Distribution S. Sivasubramanian, G. Pierre, and M. Van Steen. | Akamai EdgeSuite. http://www.akamai.com/en/html/services/edgesuite.html. | DBProxy: A dynamic data cache for Web applications. In Proc. Intl. Conf. on Data Engineering, K. Amiri, S. Park, R. Tewari, and S. Padmanabhan. | Characterizing the scalability of a large web-based shopping system. ACM Transactions on Internet Technology, M. Arlitt, D. Krishnamurthy, and J. Rolia. | Adaptive database caching with DBCache. Data Engineering, C. Bornh¨ovd, M. Altinel, C. Mohan, H. Pirahesh, and B. Reinwald. | Towards robust distributed systems. Proc. ACM Symp. on Principles of Distributed Computing, E. A. Brewer.
2. Overall Description 2.1 Product Perspective Nowadays, wireless wide area networks have become quite common in large enterprises that work with relational database. Various consistency models have been developed to ensure consistency among the replicas available. This system is developed to provide good consistency to enterprises that uses wireless medium as their primary network for communication. This is also developed with an aim to improve the scalability of these systems by having partial replication scheme.
2.2 Product Features This product provides good consistency even with stale network. With this scheme, laptops and PDAs also can be used as replicas. The use of small scale database management systems like MySQL, Apcahe Derby, Postgre makes the cost of deployment minimal. The scalability of the system will be very high using this scheme. The scalability is assured by making use of partial replication schemes. The replication system is dynamic and hence the system will easily adapt to the situation. The data that are being frequently accessed is replicated dynamically, hence providing high scalability and availability.
2.3 Operating Environment This system is designed to operate in all the environments. The system is to be designed with java, thereby providing platform independence and portability. The native libraries that are needed are written with Visual C++ in case of windows and BSD C implementation for all other systems. The master copy of the database is with any high end database management system such as Oracle or SQL Server. The replication is done with any light weight database management system, which can easily fit into mobiles, PDAs or laptops. The operating system is constrained only by the database being used.
2.4 Design and Implementation Constraints The system is designed in such a way that there is no relaxation on consistency, at any degree of irregularity with the network. The heterogeneity in various areas, such as database or OS, is always compromised by the use of the proxy. The synchronization among various replication and other parts of the system is maintained carefully. Page 2
Software Requirements Specification for Neutrino
Page 3
2.5 Assumptions and Dependencies The system is developed with an assumption that at least one replication is always available. The query router is aware of all the database management system being used in the system and it is capable of converting the query understandable by the necessary database. The database management systems being used have their own JDBC driver written.
3. System Features Neutrino the distributed system for efficient replication has the following features:
3.1 Partial Replication 3.1.1 Description and Priority This feature enables the system to replicate the data that are frequently used. The replication strategy is chosen dynamically by the system. The system has an eye over the data that being accessed. The data that are frequently fetched are replicated by removing any of the replicated data, using suitable replacement algorithms. This module is of primary concern since it improves scalability. 3.1.2 Stimulus/Response Sequences The user requests some data from the server. The query router routes the request to appropriate replica. If the number of request to that particular data is high, the data in the replica that is least frequently used are replaced by the new data. The change log present in the query router is also updated. 3.1.3 Functional Requirements To achieve this, the system must have an intelligent algorithm so that it can choose the type of fragmentation to replicate data partially. The system must also sensibly choose the replacement algorithm that matches the current scenario.
3.2 Consistency Enforcement 3.2.1 Description and Priority Consistency and synchronization among the various replications is a very difficult task. The replica that enters the network must be synchronized immediately to the server. Until then, the replica is not allowed to serve any of the client requests. A proper versioning system must also be implemented so that the client can be aware how old is the data present in it. When the fingerprint of the replica and the master has a least variation, then the changes alone are updated. When there is a large deviation, then whole data in replica are replaced. This is also a primary task. 3.2.2 Stimulus/Response Sequences There is an update in the master copy of the database. The clients that are currently connected are updated immediately. Consider a new client entering the network after the Page 3
Software Requirements Specification for Neutrino
Page 4
update. The version of the database is first checked against the version of the data in master copy. If there is a mismatch, then fingerprint of the data present in the client is obtained and is matched against the fingerprint of the master. If there is less variation the changes are updated. If there is a huge variation then the whole database is replaced. 3.2.3 Functional Requirements To complete this task, a suitable fingerprinting and versioning system must be chosen. A suitable versioning scheme and change log is also mandatory. Use of timestamp based versioning system is a good choice since there is no chance of repetition in the timestamp value.
3.3 Adaptation Triggering 3.3.1 Description and Priority The system must be capable of serving any number of clients. The system should perform well, even with stale network and low bandwidth. Dynamic replication strategy itself solves this problem. But special care should be taken on the number of clients connected, bandwidth utilization and CPU load. The concept of adaptation triggering also deals with maintaining the performance of the system, even when some of the nodes fail. Whenever a node, which may be query router or the master copy or some of the replications, fail the system has to be fault tolerant. The failure symptoms should not cross the end point of the system, which is the edge server. This is an additional feature. 3.3.2 Stimulus/Response Sequences Whenever there is a failure in the system, the controller server comes to know about the failure. The controller then informs the query router that the particular node has failed. The query router then redirects the clients to some other replica and does not allow any other new client to be served by that replica until it recovers from the failure. A separate log for connected and disconnected nodes in the system is maintained which helps to prevent inconsistency. When the failure is with master copy, the replicas receive the updates and it updates all other replica and when master server is alive, it is then updated. 3.3.3 Functional Requirements This module requires an algorithm to take intelligent decision during the time of failures. The messaging system among various nodes has to be developed. The isolation of the system failures is also necessary. Consistency among the logs is also implemented.
3.4 Request routing 3.4.1 Description and Priority There are number of query routers available in the network. Query router itself directs the client request to appropriate replica. An initial request routing mechanism has to be employed to choose a proper query router. The client is routed after taking care of many parameters. The parameters include network estimation metrics, traffic, load in the network, CPU load of the destination node.
Page 4
Software Requirements Specification for Neutrino
Page 5
3.4.2 Stimulus/Response Sequences Client when need to by served by the Neutrino, takes the following route to reach its server. With network estimation metrics, the calculation of RTT and the geographical distance between the client and the server are used to decide the route. The traffic on the network must be found and a path with less traffic must be chosen. The bandwidth utilization in a network must also be calculated. The path with least utilization of bandwidth is chosen to be the right path. At last, the number of client served by that particular server is found. If the replica or query router is seen to handle more number of client requests, or it is busy due to some other cause, then some other replica or query router has to be found out. The node with least cost path and that having less CPU load is finally selected. The clients are served by that replica. 3.4.3 Functional Requirements This module also needs an intelligence algorithm to take proper routing decision. This routing algorithm must be implemented in two levels. First the client has to be routed to a proper query router and the query router must route to a proper replica.
4. External Interface Requirements 4.1 User Interfaces There need not be any rich UI since this is purely a middleware. But to monitor the functionality of various modules present in the system, there is small console. The console shows some necessary information like the bandwidth utilization in various networks, CPU load in each node, connected and disconnected nodes, data present in various replica nodes, client that are being served by various replica and query router in the system.
4.2 Hardware Interfaces This system does not use any specific hardware interfaces.
4.3 Software Interfaces Each module in the network like the query router or the client proxy or the JDBC driver is developed as a component and deployed. Each component is interfaced using regular interfacing techniques wherever needed.
4.4 Communications Interfaces This system uses Wi-Fi as the physical medium for communication and it uses TCP/IP for transfer of data. The control information is sent using UDP. Many communications are done using RPC and RMI Technologies.
Page 5
Software Requirements Specification for Neutrino
Page 6
5. Other Nonfunctional Requirements 5.1 Performance Requirements The performance of the system is directly proportional to the number of replicas the system and also on the configurations of various nodes used in the network. The theoretical study made on this project shows that this system perform high in systems that involves more number of read and infrequent updates on data.
5.2 Safety Requirements With concern towards the safety of the system, more number of replication for the same data is mandatory. If any of the replicas is away from the network the system can still perform well due to the availability of the replica.
5.3 Security Requirements With concern to security the client cannot access any other module of the system. The only module known should be the edge servers. The client should not be aware of the internal mechanism because he may make a security breach in the system, with regard to the internal security policies, the developer must be allowed to create a UDP, TPC socket. The clients must also be allowed to make a RMI or RPC request to any other node in the network.
5.4 Software Quality Attributes For the customer of the software, the software requires very less configurations. The client itself discovers the server; the proxy by itself takes care of the connected and disconnected replica. The request routing and consistency enforcement are to be done dynamically. Regarding security, the client cannot directly access the system, the client request are stopped at the edge server itself and the queries and other request are rewritten by itself and it is purely Neutrino base. There cannot be any sort of security breach inside the system due to the above said features. The system is easily portable. The system is purely component oriented any component can be revised in future without any disturbance to the other components in the system.
Page 6