PERFORMANCE EVALUATION OF PEER-TO-PEER APPROACH TO RESOURCE DISCOVERY FOR LARGE SCALE GRID ENVIRONMENTS Sivadon Chaisiri1 and Putchong Uthayopas2 ABSTRACT: Grid environment usually consists of large number of shared resources distributed across multiple locations. Hence, the problem of resource discovery can be profoundly complex due to the size and complexity of grid system. Recently, peer-to-peer system has been widely used due to its massive resource sharing environment. The merging of the capability of both systems has a potential to create a very large scale sharing environment with rich functionality. In this work, the architecture for building grid networks based on peer-to-peer approach is proposed. Furthermore, some resource discovery algorithms based on the architecture are explored, and evaluated by building a simulation. The results of this work will lead to the development of the effectively large scale grid network in the future. KEYWORDS: peer-to-peer computing, grid computing, resource discovery algorithm 1. INTRODUCTION Grid environment consists of a large set of resources, which are distributed among geographical locations (Foster 1999) (Foster 2001). In large set of shared resources, a basic problem is locating resources when they are represented only by a set of desired attributes (i.e. a Unix computer with physical memory of 256 MB) rather than a globally unique identifier (such Internet address). Moreover, it is impossible to assign the unique name to resources that change their attributes dynamically (e.g. CPU load and available memory). The complexity and size of grid environments even make it more challenging to discover the preferred resources in an appropriate amount of time. Recently, peer-to-peer system is emerging as a new sharing mechanism that widely used around the world (Alfred 2003). Millions of resources are shared among massive number of computers that can freely join or leave the system. The dynamism and scale of peer-to-peer system seems to be what is needed by the grid system. Therefore, the convergence of both systems (peer-to-peer and grid) can potentially create a large distributed system that is scalable and efficient. Nevertheless, the problem of discovering resources under this system is still existed. In this work, a proposed architecture for that combines the benefit of grid and peer-to-peer system are presented. Furthermore, several resource discovery algorithms also are proposed for this architecture. Then, the proposed architecture is simulated to evaluate the effectiveness of these proposed algorithms. 2. PROPOSED ARCHITECTURE In this work, resources (e.g. computers, printers, sensors etc.) that connected together using the network are termed “peer”. There are 4 types of peer as follows. 1 2
Master Student ,High Performance Computing and Networking Center, Kasetsart University Assistant Professor, Department of Computer Engineering, Kasetsart University
• •
•
•
Edge Peer is a peer that shares resources to other peers. Super Peer is a peer that keeps a set of locations of edge peers and known super peers. Furthermore, it has a duty to propagate queries (for resources) received from any peers to its known peers. We call peers that known by other as “neighbored peers”. VO Peer is a peer that forms a virtual organization. Moreover, it defines the policy for managing peers inside the VO. To join a specific VO by any peers, they need to join via registering with the VO peer that manages the VO. Global Peer is a peer that provides the naming directory of VO peers that need to advertise their existence to other peers that need to join with.
Grid system proposes the term “virtual organization (VO)” for managing shared resources based on the defined policy. To merge the capability of peer-to-peer to the grid system for building a systematic sharing environment, we must provide the concept of virtual organization as well. However, the conventional peer-to-peer systems do not provide the facility for defining global policy. Although, some literature such as (Iamnitchi 2001) has also proposed the approach and structure for merging both grid and peer-to-peer systems, the building of VO has not fully addressed. Our work is different from since we propose the components for building the grid environment that also address the problem of building a virtual organization as well. In addition, we show that the propose algorithms can easily scale to as many as 100,000 peers while other works usually address the system of only up to 5,000 peers. In the next section, several algorithms for resources discovery will be proposed. 3. ALGORITHMS Most of the resource discovery algorithms (included our proposed algorithms) for peer-to-peer systems are based on the widely used flooding strategy. This strategy is to flood queries from one peer to its neighbored peers. Firstly, a user submits queries from his peer (called a requester) to other neighbored peers. When a neighbored peer receives the queries, it will process the queries over its own local shared resource. If any results are found, it will send a response message (included information for accessing to the resources) back to requester. The neighbored peer also forwards the queries to its own neighbored peers. To prevent infinite loop of flowing queries in the network, every message of query is handled by time to live (TTL) that specifies how many hops the message may take. When a peer receives a query, it decrements the TTL, and if the TTL is greater than 0, it still forwards the query to all its neighbored peers. Until the TTL is 0, the query is stopped forwarding. The advantage of this flooding strategy lies on the fact that it does not require any centralized servers. However, this strategy produces large amount of messages in order to propagate the queries to peers. This it will degrade the overall performance of network severely. In this work, 3 resource discovery algorithms are proposed as follows: •
Algorithm-1 (named “Query-Filter Algorithm) this algorithm is designed to improve the old flooding algorithm that we labeled as Algorithm-0 (named “Dumb Flooding Algorithm”). The query-filter algorithm improves the dumb flooding algorithm by reducing the bursting of produced messages. Each query attaches the information about routed peers (the peers that routed by queries). That is, when a query passes each peer, the query will attach information to inform that it has already passed this peer. Consequently, before propagating queries to others by a peer, it must check the routed peers from the information. If it found that the
query had to pass their neighbored peers, they will not propagate it to them; otherwise, they continuously flood it to them. Furthermore, every peer also filters seen queries by checking from the identifier number of every query. That is, each peer must store identifier numbers of received queries. If it found the seen queries, it will not propagate them to others. •
Algorithm-2 (named “Backing Links Algorithm”) this algorithm improves on the queryfilter algorithm. The algorithm is done when a preferred resource is found, the response message that included links (URLs for connecting peers) of all routed peers will be returned to each others (included the requester who send the queries) for connecting them as neighbored peers. We assumed that the requester should discover preferred resources faster if it knows a lot of neighbor peers.
•
Algorithm-3 (named “Backing Resource Links Algorithm”) this algorithm improves the backing links algorithm by inserting additional information to the response message. The additional information is about the method for accessing the preferred resource. We assumed that if peers know about resources of many peers, the performance of resource discovery should be speeded up.
4. SIMULATION AND EVALUATION We build the simulated environment for evaluate the proposed resource discovery algorithms. The simulation is implemented by HyperSim (Sugree 2003). We simulated 4 sharing environments composed with different numbers of peers: 100, 1,000, 10,000, and 100,000 peers. In each environment, every VO consists of one selected VO peer to manage a policy. Each peer shares only one unique resource. Every peer is assumed to be connected via 1 Mbps network. Table 1. Configuration of Simulated Environments
Num b er of p ro d uced m essag es
Environment No. 1 2 3 4
Number of Peers 100 1,000 10,000 100,000
Number of VO 1 2 4 8
3000000
Number of Super peers/VO 1 2 4 8
Number of Peers/VO 100 500 2,500 12,500
100
1000
10000
100000
Number of peers
Figure 1: Number of messages produced by the algorithms simulated on each environment
Average Response Time (ms.)
We have 2 metrics for testing the algorithms. The first10000 metric is the number of produced messages Algorithm-0 Algorithm-0 9000 2500000 originated from flooding queries to peers. For testing this8000 metric, we simulate situations that there is a Algorithm-1 Algorithm-1 Algorithm-2 Algorithm-2 7000 peer2000000 submits a query for finding out a preferred resource. We observe the number of produced Algorithm-3 Algorithm-3 6000 messages From the figure 1., we test two algorithms 1500000 within 10 seconds by setting TTL = 12 hops.5000 (dumb flooding and query-filter algorithms) and redo this testing for 100 times on each environment. 4000 1000000 3000 Then we calculate the average of the number of produced messages. We found that the number of 2000 500000 peers will dramatically affect to the message production, especially produced by the dumb flooding 1000 0 algorithm. But our three proposed algorithms reduce tons of0 produced messages. 100
1000
10000
100000
Number of peers
Figure 2: Average response time taken by the algorithms simulated on each environment
In each simulated environment, we test by submitting one query for every 5 seconds. We submit totally 20 queries and record the average response time calculated from the 20 queries. We redo this testing for 100 times for each environment. Then we recalculate the average response time of 100 values (recorded from each environment). The results of the simulation are as shown in Figure 1 and 2. In Figure 1, the number of messages generated from each is shown. It can be clearly seen that number of message for algorithm 1, 2, and 3 are much less than the basic dump flooding algorithm. But it is hard to distinguish between algorithm 1, 2 and 3 since the number of message is practically at the same level. In Figure 2, the average response time for querying for each algorithm are shown. A few facts can be observed. First, each improvement results in the faster query time. The query time of the backing resource links algorithm is shortest and much better than basic dump flooding algorithm. Moreover, as the number of peers are increased for 10 times, the average response time increases at a very slow rate. This means that algorithm-3 has a potential to scale to a very large system size. 5. CONCLUSIONS AND FUTURE WORK From the simulation, we found that our proposed resource discovery algorithms have a great potential to be used on a large scale resources sharing environments. The simulation shows that the proposed algorithms can reduce the number of messages produced effectively. This results in a much less average response time. In the future work, we need to implement a real system based on our proposed architecture. Many implementation issues such as buffering, added latency needed to address to create a fast and scalable infrastructure. 6. REFERENCES Alfred, W. L. (2003) “The Future of Peer-to-Peer Computing.” Communications of The ACM, September 2003/Vol.46 No. 9, pages 57 – 61, 2003 Foster, I., Kesselman, C., and Tuecke, S. (2001) “The anatomy of the Grid: Enabling scalable virtual organization.” The International Journal of Supercomputing Applications, 2001 Foster, I., Kesselman, C., and Tuecke, S. (1999) “The Grid: Blueprint for a New Computing Infrastructure.” Morgan Kaufmann, 1999. Iamnichi, A., and Foster, I. (2001) “On Fully Decentralized Resource Discovery in Grid environments.” The International Workshop on Grid Computing, 2001 Sugree, P., Putchong, U., and Voratas, K. (2003 )“Fast Simulation Model for Grid Scheduling using HyperSim”, In Proceedings of Winter Simulation Conference, New Orlean, 2003.