And Their Exchanged Messages

  • Uploaded by: spiritwin
  • 0
  • 0
  • May 2020
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View And Their Exchanged Messages as PDF for free.

More details

  • Words: 2,288
  • Pages: 29
Managing Grid and Web Services

and their exchanged messages OGF19 Workshop on Reliability and Robustness Friday Center Chapel Hill NC January 31 2007 Authors Harshawardhan Gadgil (his PhD topic), Geoffrey Fox, Shrideep Pallickara, Marlon Pierce Community Grids Lab, Indiana University Presented by Geoffrey Fox



Management Problem I

Characteristics of today’s (Grid) applications – Increasing complexity – Components widely dispersed and disparate in nature and access   

Span different administrative domains Under differing network / security policies Limited access to resources due to presence of firewalls, NATs etc… (major focus in prototype)

– Dynamic

 Components (Nodes, network, processes) may fail



Services must meet

– General QoS and Life-cycle features – (User defined) Application specific criteria



Need to “manage” services to provide these capabilities

– Dynamic monitoring and recovery – Static configuration and composition of systems 2 from subsystems

Management Problem II



Management Operations* include – – – –

 

Configuration and Lifecycle operations (CREATE, DELETE) Handle RUNTIME events Monitor status and performance Maintain system state (according to user defined criteria)

Protocols like WS-Management/WS-DM define interservice negotiation and how to transfer metadata We are designing/prototyping a system that will manage a general world wide collection of services and their network links – Need to address Fault Tolerance, Scalability, Performance, Interoperability, Generality, Usability



We are starting with our messaging infrastructure as

– we need this to be robust in Grids we are using it in (Sensor and material science) – we are using it in management system – and it has critical network requirements * From WS – Distributed Management http:// devresource.hp.com/drc/slide_presentations/wsdm/index.jsp 3



Core Features of Management Architecture Remote Management

– Allow management irrespective of the location of the resource (as long as that resource is reachable via some means)



Traverse firewalls and NATs

– Firewalls complicate management by disabling access to some transports and access to internal resources – Utilize tunneling capabilities and multi-protocol support of messaging infrastructure



Extensible

– Management capabilities evolve with time. We use a service oriented architecture to provide extensibility and interoperability



Scalable

– Management architecture should be scale as number of managees increases



Fault-tolerant

– Management itself must be fault-tolerant. Failure of transports OR management components should not cause management architecture to fail. 4

Management Architecture built in terms of  Hierarchical Bootstrap System – Robust itself by Replication 





managees in different domains can be managed with separate policies for each domain Periodically spawns a System Health Check that ensures components are up and running

Registry for metadata (distributed database) – Robust by standard database techniques and our system itself for Service Interfaces 



Stores managee specific information (Userdefined configuration / policies, external state required to properly manage a managee) Generates a unique ID per instance of registered component

Architecture: Scalability: Hierarchical distribution Spawns if not present and ensure up and running

Replicated ROOT

EUROPE

US

Passive Bootstrap Nodes



•Only ensure that all child bootstrap nodes are always up and running

Active Bootstrap Nodes /ROOT/EUROPE/CARDIFF

CGL

FSU

CARDIFF

•Responsible for maintaining a working set of management components in the domain •Always the leaf nodes in the hierarchy

Management Architecture built in terms of 

Messaging Nodes form a scalable messaging substrate  





Managers – Active stateless agents that manage managees. 





Message delivery between managers and managees Provides transport protocol independent messaging between distributed entities Can provide Secure delivery of messages

Both general and managee specific management threads performs actual management Multi-threaded to improve scalability with many managees

Managees – what you are managing (managee / service to manage) – Our system makes robust 





There is NO assumption that Managed system uses Messaging nodes Wrapped by a Service Adapter which provides a Web Service interface Assumed that ONLY modest state needed to be stored/restored externally. Managee could front end and

Architecture: Conceptual Idea (Internals) Manager

WS Management

Service Adapter

Messaging Node

Resource to Manage (Managee )

Manager

... Manager

Manager processes periodically checks available managees to manage. Also Read/Write managee specific external state from/to registry

Service Adapter

Always up Alwaysensure ensure and running up and running

System Health Check Manager

Bootstrap Service

Registry

User writes system configuration to

Connect to Messaging Node for sending and receiving messages

Resource to Manage (Managee )

Periodically Spawn

...

Service Adapter

Resource to Manage (Managee )

Architecture: User Component 



“Managee Characteristics” are determined by the user. Events generated by the Managees are handled by the manager  Event processing is determined by via WS-Policy constructs 





E.g. Wait for user’s decision on handling specific conditions “Auto Instantiate” a failed service but service responsible for doing this consistently even when failed service not failed but just unreachable

Administrators can set up services (managees) by defining characteristics 

Writing information to registry can be used to

Issues in the distributed system Consistency 

Examples of inconsistent behavior   



Two or more managers managing the same managee Old messages / requests reaching after new requests Multiple copies of managees existing at the same time / Orphan managees leading to inconsistent system state

Use a Registry generated monotonically increasing Unique Instance ID (IID) to distinguish between new and old instances of Managers, Managees and Messages 





Requests from manager thread A are considered obsolete IF IID(A) < IID(B) Service Adapter stores the last known MessageID (IID:seqNo) allowing it to differentiate between duplicates AND obsolete messages Periodic renewal with registry  IF IID(manageeInstance_1) < IID(manageeInstance_2)  THEN manageeInstance_1 was deemed OBSOLETE  SO EXECUTE Policy (E.g. Instruct manageeInstance_1 to silently shutdown)

Issues in the distributed system Security 

Security – Provide secure communication between communicating parties (e.g. Manager <-> Managee)   



 

*

Publish/Subscribe:- Provenance, Lifetime, Unique Topics Secure Discovery of endpoints Prevent unauthorized users from accessing the Managers or Managees Prevent malicious users from modifying message (Thus message interactions are secure when passing through insecure intermediaries)

Utilize NaradaBrokering’s Topic Creation and Discovery* and Security Scheme#

NB-Topic Creation and Discovery (Grid2005)

http://grids.ucs.indiana.edu/ptliupages/publications/NB-TopicDiscovery-IJHPC 

#

NB-Security (Grid2006)

Implemented: 

WS – Specifications 











WS – Management (June 2005) parts (WS – Transfer [Sep 2004], WS – Enumeration [Sep 2004] and WS – Eventing) (could use WS-DM)

WS – Eventing (Leveraged from the WS – Eventing capability implemented in OMII) WS – Addressing [Aug 2004] and SOAP v 1.2 used (needed for WS-Management)

Used XmlBeans 2.0.0 for manipulating XML in custom container.

Currently implemented using JDK 1.4.2 but will switch to JDK1.5 Released on http://www.naradabrokering.org in February 2007

Performance Evaluation Results 

Extreme case with many catastrophic failures



Response time increases with increasing number of concurrent requests



Response time is MANAGEE-DEPENDENT and the shown times are typical



MAY involve 1 or more Registry access which will increase overall response time



Increases rapidly as no. of Managees > (150 – 200)

managees

Performance Evaluation How much infrastructure is required to manage N managees ?  









N = Number of managees to manage M = Max. no. of entities connected to a single messaging node D = Max. no of managees managed by a single manager process R = min. no. of registry service instances required to provide fault-tolerance Assume every leaf domain has 1 messaging node. Hence we have N/M leaf domains. Further, No. of managers required per leaf domain is M/D



Total Components in lowest level = (R registry + 1 Bootstrap Service + 1 Messaging Node + M/D Managers) * (N/M such leaf domains) = (2 + R + M/D) * (N/M)



Thus percentage of additional infrastructure is = [(2 +R)/M + 1/D] * 100 %

Performance Evaluation Research Question: How much infrastructure is required to manage N managees ? 



Additional infrastructure = [(2 +R)/M + 1/D] * 100 % A Few Cases 





Typical values of D and M are 200 and 800 and assuming R = 4, then Additional Infrastructure = [(2+4)/800 + 1/200] * 100 % ≈ 1.2 % Shared Registry => there is one registry interface per domain, R = 1, then Additional Infrastructure = [(2+1)/800 + 1/200] * 100 % ≈ 0.87 % If NO messaging node is used (assume D = 200), then Additional Infrastructure = [(R registry + 1 bootstrap node + N/D managers)/N] * 100 % = [(1+R)/N + 1/D] * 100 % ≈ 100/D % (for N >> R) ≈ 0.5%

Performance Evaluation Research Question: How much infrastructure is required to manage N managees ? How Cost varies with maximum Managees per Manager (D) ?

Percentage of Additional Infrastructure

110.0

100.4

90.0 70.0 50.0

33.7 20.4

30.0

14.7

10.4

10.0 -10.0

1

3

5

7

10

4.4

2.4

1.4

1.0

0.9

25

50

100

150

200

Max. Managees per Manager (D) "For R = 1, M = 800"

Performance Evaluation XML Processing Overhead 



XML Processing overhead is measured as the total marshalling and un-marshalling time required. In case of Broker Management interactions, typical processing time (includes validation against schema) ≈ 5 ms 





Broker Management operations invoked only during initialization and failure from recovery Reading Broker State using a GET operation involves 5ms overhead and is invoked periodically (E.g. every 1 minute, depending on policy) Further, for most operation dealing with changing broker state, actual operation processing time >> 5ms and hence the XML overhead of 5 ms is acceptable.

Prototype: Managing Grid Messaging Middleware 

We illustrate the architecture by managing the distributed messaging middleware: NaradaBrokering  This example motivated by the presence of large number of dynamic peers (brokers) that need configuration and deployment in specific topologies  Runtime metrics provide dynamic hints on improving routing which leads to redeployment of messaging system (possibly) using a different configuration and topology  Can use (dynamically) optimized protocols (UDP v TCP v Parallel TCP) and go through firewalls



Broker Service Adapter  Note NB illustrates an electronic entity that didn’t start off with an administrative Service interface  So add wrapper over the basic NB BrokerNode object that provides WS – Management front-end 

Allows CREATION, CONFIGURATION and MODIFICATION of broker and broker topologies

Messaging (NaradaBrokering) Architecture

Slow Client behind modem Compute Server

Computer

User

PDA

Messaging Substrate

File Server

Media Device

Media Server

Audio / Video Conferencing Client

Laptop Compute Server behind firewall

19

Typical use of Grid Messaging in NASA Sensor Grid implementing using NB

NB

June 19, 2006

Datamining Grid

Community Grids Lab, Bloomington IN :CLADE 2006:

GIS Grid

20







NaradaBrokering Management Needs NaradaBrokering Distributed Messaging System consists of peers

(brokers) that collectively form a scalable messaging substrate. Optimizations and configurations include: – Where should brokers be placed and how should they be connected, E.g. RING, BUS, TREE, HYPERCUBE etc…, each TOPOLOGY has varying degree of resource utilization, routing, cost and fault-tolerance characteristics. Static topologies or topologies created using static rules may be inefficient in some cases – E.g., In CAN, Chord a new incoming peer randomly joins nodes in the network. Network distances are not taken into account and hence some lookup queries may span entire diameter of network – Runtime metrics provide dynamic hints on improving routing which leads to redeployment of messaging system (possibly) using a different configuration and topology – Can use (dynamically) optimized protocols (UDP v TCP v Parallel TCP) and go through firewalls but no good way to make choices dynamically 21 These actions collectively termed as Managing the Messaging

Prototype: Costs (Individual Managees are NaradaBrokering Brokers)

Time (msec) (average values) Operation

Un-Initialized (First time)

Initialized (Later modifications)

Set Configuration

778 ± 5

33 ± 3

Create Broker

610 ± 6

57 ± 2

Create Link

160 ± 2

27 ± 2

Delete Link

104 ± 2

20 ± 1

Delete Broker

142 ± 1

129 ± 2

Recovery: Typical Time Topology

Number of managee specific Configuration Entries

Ring

N nodes, N links (1 outgoing link per Node) 2 managee Objects Per node

Cluster

N nodes, Links per broker vary from 0 –3 1 – 4 managee Objects per node



Recovery Time = T(Read State From Registry) + T(Bring managee up to speed) = T(Read State) + T[SetConfig + Create Broker + CreateLink(s)]

10 + (778.0 + 610.1 + 160.5)

≈ 1548 msec Min: 5 + 778.0 + 610.1

Max: 20 + 778.0 + 610.1 + 160.5 + 2* 26.67

≈ 1393 msec

≈ 1622 msec

Assuming 5ms Read time from registry per managee object

Prototype: Observed Recovery Cost per managee Operation *Spawn Process Read State

Average (msec) 2362 ± 18 8±1

Restore (1 Broker + 1 Link)

1421 ± 9

Restore (1 Broker + 3 Link)

1616 ± 82

Time for Create Broker depends on the number & type of transports opened by the broker E.g. SSL transport requires negotiation of keys and would require more time than simply establishing a TCP connection If brokers connect to other brokers, the destination broker MUST be ready to accept connections, else topology recovery takes more time.

Management Console: Creating Nodes and Setting Properties

Management Console: Creating Links

Management Console: Policies

Management Console: Creating Topologies

Conclusion 

We have presented a scalable, fault-tolerant management framework that 









Adds acceptable cost in terms of extra resources required (about 1%) Provides a general framework for management of distributed entities Is compatible with existing Web Service specifications

We have applied our framework to manage Managees that are loosely coupled and have modest external state (important to improve scalability of management process) Outside effort is developing a Grid Builder which combines BPEL and this management system to manage initial specification,

Related Documents

Messages
November 2019 39
Messages
June 2020 20
Messages
April 2020 26
Messages
May 2020 23
Messages
November 2019 34

More Documents from ""