Rapid Automated Deployment

  • November 2019
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Rapid Automated Deployment as PDF for free.

More details

  • Words: 4,786
  • Pages: 8
2007 IEEE Asia-Pacific Services Computing Conference

Rapid and Automated Deployment of Monitoring Services in Grid Environments* Mei Yiduo, Dong Xiaoshe, Li Junyang, Xu Jing, Xue Zhenghua School of Electronic and Information Engineering, Xi′an Jiaotong University, Xi′an, 710049, China [email protected] [email protected] Automation of deployment for monitoring system is beneficial for improved correctness and speed. Usually, the monitoring systems require the deployment of agents (sensors) on all sites and often on all end nodes, with the potential for thousands of resources and applications at geographically distant nodes in grid environments, it is crucial that the deployment and distribution mechanisms for the monitoring system scale well. Furthermore, an update of the monitoring system always involves changes to all the deployed nodes, which requires a high efficient automated deployment and update approaches to the monitoring system. In this paper, we describe a grid Monitoring System capable of Rapid and Automated Deployment (MSRAD). The objective of our proposed system is to support automated deployment to a number of nodes efficiently. Peer-to-peer protocols such as BitTorrent [12][13] can be employed to facilitate the distribution and automated deployment of the monitoring system, which is inspired by SystemImager [15]. The deployment of monitoring service for the newly joined node and the update of the deployed components can be initiated by the nodes within the system to reduce the administrator′s participation. This can reduce the maintenance cost and human errors might happen in the deployment. With the help of the peer-to-peer networks, MSRAD can automatically adapt to failures in network connections or nodes. Service capacity of our proposed system is given out. Simulation results in the framework of the Constellation Model [14] show that MSRAD can support efficient and rapid deployment to a large scale grid and provide a robust platform to efficiently monitor the grid resources. The rest of the paper is organized as follows. Section 2 summarizes previous works on monitoring systems in grid environments. Section 3 presents objectives and requirements of our proposed monitoring system. In section 4, we introduce the monitoring architecture of our proposed system. This section will also show how to deploy the components to target nodes. Service capacity is analysed in section 5. Finally, we end this paper with discussion on simulation results in section 6 and conclusion in section 7.

Abstract The monitoring service is a crucial component in the service oriented grid infrastructure. Objectives and requirements for the grid monitoring system have been summarized in this paper. To cope with the dynamic and large-scale nature of the grid, a scalable distributed monitoring system is proposed, which can support easy and rapid deployment of monitoring services in the grid environments. The BitTorrent protocol is adopted to facilitate the distribution and automated deployment of the monitoring system. The deployment of monitoring service for the newly joined node and the update of the deployed components can be initiated by the nodes within the system to reduce the administrator′s participation. The maintenance cost and human errors might happen in the deployment process can be reduced. With the help of the peer-to-peer networks, our proposed system can automatically adapt to failures in network connections or nodes. Service capacity of our proposed system is given out. Simulation results show that our proposed system supports efficient and rapid deployment to a large scale grid and provides a robust platform to efficiently monitor the grid resources.

1. Introduction Grid computing is quickly becoming the new means for creating distributed infrastructures and virtual organizations (VOs) for multi-institutional research and enterprise applications [1]. In a grid environment, the monitoring system is an essential part to keep the complex distributed system efficient [2]. Computational and data grids require a substantial amount of monitoring data be collected for various tasks such as fault detection, performance analysis, performance tuning, performance prediction, and scheduling [3]. Since the software deployment management dominates system administration costs, and configuration is a major source of errors in system deployment [4]. Research on the automated deployment mechanism of the monitoring services in grid environments has been initialized [5]. *

This research is supported by 863 Project of China (Grant No. 2006AA01A109), National Natural Science Foundation of China (Grant No. 60773118) and Program for Changjiang Scholars and Innovative Research Team in University.

0-7695-3051-6/07 $25.00 © 2007 IEEE DOI 10.1109/APSCC.2007.65

328

2. Related Work As the monitoring system is a key part of the grid infrastructure, many systems and tools have been developed to provide the monitoring services. The Grid Monitoring Architecture (GMA) [3] developed by the Global Grid Forum Performance Working Group describes components of a grid monitoring architecture and their essential interactions. GMA provides a minimal specification that will support required functionality and allow interoperability. The Relational Grid Monitoring Architecture (R-GMA) [6] which has been developed within the European DataGrid Project as a grid information and monitoring system is based on the GMA. Ganglia [7] is a scalable distributed monitoring system for high performance computing systems such as clusters and grids. It is based on a hierarchical design targeted at federations of clusters. It relies on a multicast-based listen/announce protocol to monitor state within clusters and uses a tree of point-to-point connections amongst representative cluster nodes to federate clusters and aggregate their state. ChinaGrid Super Vision (CGSV) [2] provides monitoring functions for ChinaGrid [8]. CGSV is based on the GMA, and it is designed to collect status information of each entity (such as resources, services, users, jobs, networks) and provide corresponding information data query and mining services. GlobalWatch [9] is a distributed platform to monitor various resources of grid platforms so as to improve the flexibility and usability of grid systems. Reference [5] proposes a scriptbased approach to automate the deployment of grid monitoring service component. This monitoring system is compatible with the GMA, and scalable integrates with existing grid platforms and monitoring systems. These research efforts have focused on scalability and low overhead aspects of the monitoring system, while our work has not only focused on the above aspects, but also concerned about automated and rapid distribution and deployment of the monitoring services.

Figure 1. Objectives and requirements of MSRAD z Proper granularity of data processing. Data processing in a GMS mainly involves real-time monitoring and the processing of historical records or logs. Real-time monitoring is required to reflect the status of jobs, resources and services at present, and can help real-time decision making such as error detection and correction, job scheduling. The objective is to collect data in time without causing too much overhead. Long-term records are necessary for performance improvements and accounting. The objective is to save sufficient data according to the system′s need and the node′s storage capability. z Low overhead. The objective is to minimize the load introduced by the monitoring system. The monitoring process should carry out with low interfering sideeffect on the node′s performance. And the monitoring system should transmit monitoring data with low latency [3]. z High adaptability. Heterogeneity is one of the characteristics in grid. It is important to overcome this challenge to provide a unified logical view of resources and services, including their status (monitoring data). Moreover, service providers and virtual organizations are autonomous in grid, and have their own security policies and access control strategies respectively. The monitoring system should not conflict with the node′s security policies. z Scalable. As the deployed grid size increases from tens to thousands of nodes, the objects to monitor will increase dramatically. Thus, it is necessary to design a monitoring system with high scalabilities. z Flexible. Nodes participating in a VO often have deployed grid infrastructure already. A grid monitoring system must interoperate with the existing grid middlewares and should be easy to integrate into existing grid platforms. As the time consumption and the probability of errors caused by configuration increase with the size and complexity of the grid, the grid requires an Automated Deployment Monitoring System (ADMS) to automate the

3. Objectives and Requirements of MSRAD With the potential for thousands of resources at geographically distant sites and tens-of-thousands of simultaneous grid users, it is critical that data collection and distribution mechanisms scale [3]. Several design issues should be considered in the construction of a Grid Monitoring System (GMS), see Fig. 1. The pyramid is designed based on the analysis of related works and our goals.

329

deployment of grid monitoring service components on newly joined resources. z Transparent deployment. The scale and complexity of today′s grid systems make them increasingly difficult and time-consuming to deploy. As grid nodes join and leave quickly, automated resource and service monitoring must be performed and launched at a frequency compatible with the dynamic of grid elements [10]. Further more, automated approaches to deployment prevent human errors and make the deployment process easier [4]. z Transparent update. The grid is an evolutionary system in which new types of resources and applications will join at anytime, in this case, an update on the monitoring system is necessary. An update might require changes to all the nodes which are the potential customers for the new types of resources and applications. Automated approaches to update should also be employed to facilitate the massive update of monitoring systems. Besides the above requirements, distribution and deployment of the monitoring system can benefit from the following characters of MSRAD. z Rapid deployment. Since the deployment and update of the monitoring system involve all the nodes in grid, it is important to delivery and deploy all the components in an efficient way. Our proposed system can provide rapid deployment of monitoring services to the nodes in a large scale grid. Time consumption grows very slowly with the number of target nodes, which achieves better scalability. z Self-organizing. The nodes in our proposed system are able to self-organize into a peer-to-peer overlay to facilitate the distribution and deployment of the components. The deployment of monitoring service for the newly joined node and the update of the deployed components can be initiated by the nodes within the system to reduce the administrator′s participation. The

maintenance cost and human errors happened in the deployment process can be reduced. z Fault tolerant. In the automated deployment or update process, with the help of the peer-to-peer networks, our proposed system can automatically adapt to failures in network connections or nodes, as well as to a transient population of nodes [11]. And the deployment can carry on as long as there is one complete copy of the monitoring system components, which decreases the dependence on the central server.

4. MSRAD 4.1. Monitoring Architecture Many sites or institutions participating in the grids comprise multiple clusters or supercomputers. Different clusters or supercomputers work independently from each other. A hierarchical structure is employed to transfer the large amount of monitoring data from the computing nodes to the interaction interface hosted at the monitor center, see Fig. 2. There are three roles in the monitoring architecture: the computing node, the front-end node and the monitor center. Nodes in different layer are required to deploy different modules. To the computing nodes, which locate at the bottom layer, deployment agent, update agent, controller and sensors have to be installed on them. Deployment agent has to be installed at the beginning phase of the deployment. Technically, it can be implemented using the peer-to-peer client, such as BitTorrent. After the transfer of the deployment agent is complete, the other modules can be treated as the content delivered by the deployment agent. Sensors are responsible for collecting the status of the target resources. Monitoring data are processed into a standard format, such as XML file. Update agent executes the update actions. The controller module is listening to the network

Monitor Center Front-End Node Computing Node

Interaction Interface

Database

Communication Module

Update Agent

Global Aggregator

...

... Cluster Supercomputer

Cluster

Figure 2. Monitoring architecture of MSRAD

330

Controller

Data Aggregator

Archiver

Controller

Deployment Agent

Deployment Agent Update Agent Sensor Sensor Sensor

to receive instructions from upper layer. Instructions are then dispatched to corresponding modules. Besides deployment agent, update agent and controller, data aggregator is running on the front-end node of a cluster or supercomputer. The data aggregator is in charge of aggregating the monitoring data collected from the individual nodes within clusters or supercomputers. Monitor center holds all the monitoring data and users or other grid services can access the data through the interaction interface. Communication module is in charge of the transferring of the instructions and monitoring data. All the monitoring data is gathered by the global aggregator and processed by the archiver. Historical records can be retrieved within the database.

sends a deployment request to the registration center to require the deployment of monitoring service (1). b. According to the role of the target node, registration center replies the request with a source node which contains the required part of the monitoring system. Meanwhile, registration center will send a notification to the source node (2). c. Communications between the target node and the source node are established through handshaking. Then, deployment agent module is transferred from the source node to the target node (3). An auxiliary protocol such as the GridFTP is needed to distribute the deployment agent. d. The registration center could act as a tracker, which helps target nodes downloading the same files to find each other in the BitTorrent networks. The target node contacts the registration center periodically to show it has which pieces of the deployment files. Then the registration center returns a list of nodes that require the same deployment file but have different file pieces (4). e. The target node then establishes a connection to other target nodes and finds out which pieces reside in each other node [12]. Other modules are distributed via the BitTorrent protocol (5). Since the target nodes attempting to deploy the same part of monitoring system could contact to each other simultaneously and download different pieces of the deployment file from different nodes, the distribution and deployment process can be accelerated. The nodes involved in the deployment form a peer-to-peer overlay, the deployment of the monitoring system is voluntary and self-organized, without the administrator′s participation. With the help of the reliability brought by the BitTorrent protocol, the deployment can continue even under temporary failure of some nodes. In a worst case, the deployment can carry on as long as there is a complete copy in the network, even if the source node is down. In the push mode, interactions happen in the following sequence. a. When a new update is available, the push mode is enabled by the source node, which sends a notification to the registration center, and the registration center will reply with a list contains the IDs of the target nodes need to deploy the update component (6). b. The target nodes can download the new update from the source nodes, or they can download the file blocks for update from other target nodes with the help of the registration center and the update agent (7).

4.2. Automated Deployment and Update In MSRAD, the monitoring system images are hosted in a central repository on a server, called the source node. The nodes to deploy the monitoring system are referred to as the target nodes. A registration center is needed to provide the registration services for the source node and the target nodes. Administrator can customize the monitoring system′s configuration according to the needs for efficiently monitoring, and be assured that the monitoring system, once deployed, will behave correctly in different nodes. The content can be distributed among the target nodes using the BitTorrent protocol. The MSRAD architecture supports two interactions for transferring data between source nodes and target nodes: subscription and push, which correspond to the deployment and update process respectively. Fig. 3 illustrates the subscription and push mode, where solid lines represent the interactions happened in the subscription mode and dashed lines represent the ones in the push mode. Source Node

Registration Center ② ⑥

⑦ Node1

⑦ ⑤

Node2

③ ⑦



⑦ ①





④ ②

⑤ ⑤

Node5 Node4

Node3

Figure 3. Subscription mode and push mode In the subscription mode, the interaction is initiated by the target nodes, as shown in Fig. 3. Suppose that node 3 is a newly joined node (target node). Interactions happen in the following sequence. a. At first, the target node contacts the registration center to notify its arrival and register itself. After that, it

4.3. Interaction Interface We provide a web based interface to administrator and users, as shown in Fig. 4. Through the interface they can query the load on the target nodes and browse the status

331

Figure 4. Web based interface of the target resources. The web based interface is easy to use and barrier to first use is low.

node is referred to as a client. In this case, the bandwidth u between a provider and a client is fully utilized. This ideal transferring process is called as one-to-one service, as shown in Fig. 5.

5. Service Capacity In this section, we will analyze the service capacity of our proposed system. To simplify the demonstration, we consider a deployment process in a cluster, the involved nodes are homogeneous. Suppose that there are y target nodes, and only one source node. Since the BitTorrent protocol divides the file into equal-sized blocks (pieces) [13], we assume that the whole deployment files are devided into n pieces. In a cluster, the bandwidth between nodes is represented by u. The size of the file to deploy is I. Then

Source Node

... ... ...

Target Nodes

... ... one-to-one service

t

1 to each node, I = ∫t c( t )dt , where t0 is the starting time,

Figure 5. Examples of one-to-one service

0

and t1 is the finish time, c(t) is the download rate. The optimal model for the whole deployment is as follows. According to the BitTorrent protocol, the contents are divided into pieces (file blocks), as shown in Fig. 5, a block stands for a piece of data, a blank block means that the target node has not the corresponding piece yet, a grained block means that the target node has received the corresponding piece already. To explain our analysis, we give the following assumptions. Since the distribution is carried out within the clusters, and the nodes in the clusters are always organized by a switch, then assume that, blocks are delivered synchronously, which means in every round, x

During the beginning phase of the deployment, the bandwidth of the source node is a bottleneck. To accelerate the deployment process, an optimal scheme is that the source node sends a block to a different target node within each round, and each target node sends a block to another node. Fig. 6 illustrates this process, for example, in round one, the source node sends one block to the first target node, in round two, the source node sends one block to the second target node, at the same time, the first node shares its piece with the third node. Note that, the number in the figure indicates the round number. Suppose that, in round r, pr pieces of files are delivered, then during the beginning phase of the deployment, pr =

(1 ≤ x ≤   ) blocks, each of which is held by one node 2 are transferred to the other x nodes. Every round, a sending node is referred to as a provider, and a receiving y

2

1 + ∑ 2 r−i = 2 r−1 , where 1 refers to the piece transferred by i=r

the source node, 2 ≤ i ≤ r.

332

During the steady phase of the deployment, an ideal

re = rs + rt = 2 n − 2 + log 2 y +

y situation is that   + 1 (1 refers to the source node) 2 y nodes upload pieces for the other   nodes in each 2

Therefore, the total time consumption is 2 I T = re × tr = (2n − 2 + log 2 y + ) × y n ×u

 y  2  pieces are replicated. Note that, during the steady

The approximate result is T ≈ 2n ×

phase of the deployment, the maximal number of pieces could be transferred in each round remains at a certain

I /n I = . u n×u

(2) I 2×I = . The n×u u

formula shows an important feature of our system, i.e. our proposed system can be deployed in a rapid way and the total time remains at a certain level. In practice, we can revise Formula 2 to Tt = T + Ta, where Tt represents the total time and Ta represents the additional time caused by extra communications. Reference [5] proposed a linear deployment method, which means the target nodes are deployed one by one.

y value (   ), contrasts to the value of the beginning phase 2

(exponential growth). Suppose that the deployment process steps into the steady phase at round rs, let pr = 2  y  y = + , i.e. 1 2 rs −i , solve this equation, we can get ∑  2   2  i = rs

rs = log 2 y .

The time consumption can be calculated as Ts = n ×

Source Node

① ② ③ Target Node 1

I , u

...

i.e. the time grows linearly with the number of target nodes, contrasts to the value of our proposed system.

...

6. Simulation Results

...



To verify the efficiency of our proposed method, we adopted a simulation based test. Using a simulator can benefit the experiments for the sake of providing the flexibility of precisely controlling the parameters for the networks and the target nodes. For our proposed system, the typical scenarios include computer labs or institutions, and the number of simultaneous target nodes for deployment could be of the order of several tens or even a few hundreds. The simulator adopted by us is the one introduced in reference [13]. Our simulation is carried out in the framework of the Constellation Model [14].

...

Target Node 3



③ ...

③ Target Node y

(1)

Every round, the time consumption is tr =

round, which means that in every round, at most

Target Node 2

2 y

... one-to-one service Figure 6. Ideal deployment

50 Analysis Result Simulation Result

Time Consumption (seconds)

Until the round rs, the number of pieces which have been deployed to the target nodes can be calculated as follows. Recall that, every round, there are totally pr blocks are distributed during the beginning phase, and this process lasts for rs round, so there are totally

45

40

rs

d = ∑ 2 r −1 = 2 rs − 1 file blocks distributed in different r =1

35

target nodes, since rs = log 2 y , we can get d = y − 1. Because there are totally y × n blocks to be delivered during a deployment process, and during the steady phase,

30

 y  2  blocks can be transferred in each round, so the

20

30

40

50

60

70

80

Numer of Target Nodes

90

100

Figure 7. Analysis results and simulation results

remaining number of round can be calculated as follows. y × n − ( y − 1) 2 rt = = 2( n − 1) + , the entire number of round y y  2 

is

333

Figure 8. Deployment time with respect to bandwidth and number of target nodes To simulate a real scenario, we set the join rate (request rate) to 2, 4, 6, 8 and 10 respectively, and the join time lasts for 10 seconds. The network bandwidth of each node is set to 100Mb/s, and there is only one source node in the network, the results are shown in Fig. 7. The analysis results are calculated by Formula 2. The simulation results are got by running the simulator with corresponding input parameters. As observed from the figure, there are differences between the analysis results and the simulation results. This is because the Formula 2 is got under an ideal situation without considering the additional communications among nodes, moreover, the simulation is carried out with the assumption that the target nodes send requests to the source node in a certain rate, as is usually the case in real-world situations. However, each curve remains at a certain value. The trends of the two curves remain the same, from which we can see that the number of the target nodes has little impact on the deployment time. This feature can guarantee that our proposed system can be deployed efficiently in a large scale grid. In the second test, comparison methodology is adopted. We compare our method with the one presented in [5], in which the number of target nodes and the network bandwidth are two factors for determining the time consumption caused by the deployment process.

The bandwidth of the network varies from 100Mb/s to 1000Mb/s, and the numbers of target nodes are set to 20, 40, 60, 80 and 100 respectively. The results are shown in Fig. 8. The upper surface represents the results got by simulating the linear manner used by [5] and the lower surface represents the results of our system. As illustrated by Fig. 8, our proposed method works more efficiently and can support a rapid deployment to a large scale grid.

7. Conclusion In this paper, we have summarized the objectives and requirements for the grid monitoring system. To cope with the dynamic and large-scale nature of the grid, a scalable distributed monitoring system is proposed. The monitoring system can support rapid and automated deployment. The monitoring architecture and deployment process of our proposed system have been introduced. The BitTorrent protocol is adopted to facilitate easy and rapid deployment of monitoring services in the grid environments. Nodes in our proposed system can self organized into a peer-to-peer overlay to automate the deployment of the monitoring service. Also, this system can automatically adapt to failures in network connections or nodes.

334

Service capacity has been analyzed. Comparison methodology is adopted. Simulation results in the framework of the Constellation Model can support our analysis. From the results, we can see that our proposed system supports efficient and rapid deployment to a large scale grid and provides a robust platform to efficiently monitor the grid resources.

[7] M.L. Massie, B.N. Chun, D.E. Culler, ″The Ganglia Distributed Monitoring System: Design, Implementation, and Experience″, Parallel Computing, Vol 30, June 2004, pp. 817840. [8] H. Jin, ″ChinaGrid: Making Grid Computing a Reality″, ICADL 2004, Springer-Verlag, Berlin Heidelberg, LNCS 3334, 2004, pp. 13-24. [9] S. Di, H. Jin, S. Li, et al, ″GlobalWatch: A Distributed Service Grid Monitoring Platform with High Flexibility and Usability″, Proceedings of the 2006 IEEE Asia-Pacific Conference on Services Computing (APSCC'06), IEEE Press, 2006, pp. 440-446 [10] F. Bonnassieux, R. Harakaly, P. Primet, ″Automatic Services Discovery, Monitoring and Visualization of Grid Environments: The MapCenter Approach″, Across Grids 2003, Springer-Verlag Berlin Heidelberg, LNCS 2970, 2004, pp. 222229. [11] S. Androutsellis-Theotokis, D. Spinellis, ″A Survey of Peer-to-Peer Content Distribution Technologies″, ACM Computing Surveys, ACM Press, 2004, 36(4), pp. 335-371. [12] D. Qiu, R. Srikant, ″Modeling and Performance Analysis of BitTorrent-Like Peer-to-Peer Networks″, SIGCOMM’04, ACM Press, 2004, Portland, Oregon, USA, pp. 367-377. [13] A.R. Bharambe, C. Herley, V.N. Padmanabhan, ″Analyzing and Improving a BitTorrent Networks Performance Mechanisms″, Proceedings of 25th IEEE International Conference on Computer Communications(INFOCOM 2006), IEEE Press, 2006, Barcelona, Spain, pp. 1-12. [14] Y. Wang, X. Dong, X. He, et al, ″A Constellation Model for Grid Resource Management″, The Sixth International Workshop on Advanced Parallel Processing Technologies (APPT 2005), Springer-Verlag, Berlin Heidelberg, LNCS 3756, 2005, pp. 263-272. [15] http://wiki.systemimager.org/index.php/Main_Page

References [1] S. Hastings, S. Oster, S. Langella, et al, ″Introduce: An Open Source Toolkit for Rapid Development of Strongly Typed Grid Services″, Journal of Grid Computing, Springer Netherlands, available online, http://www.springerlink.com/content/u301u225wg5356w3 [2] W. Zheng, L. Liu, M. Hu, et al, ″CGSV: An Adaptable Stream-Integrated Grid Monitoring System″, NPC 2005, Springer-Verlag, Berlin Heidelberg, LNCS 3779, 2005, pp. 2231. [3] B. Tierney, R. Aydt, D. Gunter, et al, ″A Grid Monitoring Architecture″, Tech. Rep. GWD-Perf-16-3, Global Grid Forum (GGF), 2002, http://www-didc.lbl.gov/GGF-PERF/GMAWG/papers/GWD-GP-16-3.pdf [4] V. Talwar, D. Milojicic, Q. Wu, et al, ″Approaches for Service Deployment″, IEEE Internet Computing, IEEE Press, March /April 2005, pp. 70-80. [5] X. Dong, Y. Wang, Z. Qin, et al, ″Research on an Automatic Deployment Mechanism of Monitor Service in Grid Environment″, Proceedings of the Fifth International Conference on Grid and Cooperative Computing Workshops (GCCW′06), IEEE Press, 2006, pp. 63-70. [6] A. Cooke, W. Nutt, J. Magowan, et al, ″Relational Grid Monitoring Architecture (R-GMA)″, http://www.rgma.org/pub/Cracow-2003-rgma.pdf

335

Related Documents

Rapid Automated Deployment
November 2019 4
Deployment
November 2019 89
Deployment
June 2020 60
Deployment
November 2019 77
Automated Billing
May 2020 22