GRID COMPUTING A new trend of super computing
John Martin R
What is Grid Computing ? Grid computing is a form of distributed computing whereby a "super and virtual computer" is composed of a cluster of networked, loosely-coupled computers, acting in concert to perform very large tasks. Grid computing is intended to provide computational power.
Computational Grid A Computational Grid as “a hardware and software infrastructure that provides dependable, consistent, pervasive, and inexpensive access to computational capabilities”. Grid is a terminology used with respect to unification of resources, they may not only be the clusters, they might be standalone machines scattered across the globe. Grid can encompass desktop PCs, but more often than not its focus is on more powerful workstations, servers, and even mainframes and supercomputers working on problems involving huge datasets.
Computational Grid
Objectives of Grid Computing To obtain the computational power without the huge costs of supercomputing, and that CPU cycles would otherwise be wasted are put to good use. To share CPU time and also other things such as data files.
What makes up a Grid ? Grid Computing can also be defined as the seamless provision of access to possibly remote, possibly heterogeneous, possibly untrusting, possibly dynamic computing resources.
Gc provides seamless access to: Possibly Remote Computing Resources: Means that local resources, which are on the same LAN, and remote resources, which are geographically distant, can be accessed in exactly the same way on the Grid. Possibly Heterogeneous Computing Resources: Some computers on the Grid can run different Operating Systems on different types of machines. Accessing them via the Grid should be possible without making any special allowances for this. Possibly Untrusting Computing Resources: Means that the owner of a Computing resource on the Grid might not know or trust other users but should still be confident that they cannot access any non-shared data on their computer. The Grid should handle this security checking without any specific instruction from the user or from the sharer. Possibly Dynamic Computing Resources: One of the major selling points of Grid Computing is that it makes use of otherwise wasted CPU cycles. The problem with this is that the availability of computers to the Grid changes rapidly as computers become busy and then idle as their owner's usage varies. The Grid system should ensure that this dynamism is hidden from users so that they do not have to program explicitly to take account of this.
Dc vs Gc Distributed computing is a science which solves a large problem by giving small parts of the problem to many computers to solve and then combining the solutions for the parts into a solution for the problem. Distributed computing is a subset of grid computing.
Why do we need …? The rate of increase in network bandwidth is increasing at a rate faster than that of processor speed which means that the way to make best use of computing power is to network many computers together in an efficient fashion. Every traditional science (Physics, Chemistry, Mathematics, Biology, Astronomy, and many others) is relying more and more on computers and computational power. Grid Computing is therefore seen as the computing technology enabling the advancement of all sciences.
How it works ? The Grid relies on advanced software, called middleware, which ensures seamless communication between different computers and different parts of the world. The Grid search engine finds the data/resources the scientist needs, but also the data processing techniques and the computing power to carry them out. It then distribute the computing task to wherever in the world there is spare capacity, and send the result to the scientist.
Grids – Where to ? An interesting prediction is that grid technology will be slowly absorbed into enterprise fabrics. Grid 1.0 – concerned with the virtualization, aggregation and sharing of compute resources. Grid 2.0 – focused on the virtualization, aggregation and sharing of all compute, storage, network and data resources.
Virtualization The key term is “virtualization” (encapsulation behind a common interface of diverse implementations) is being driven by the need to various enterprises to create a virtual resource market to allocate resources based on business demand. Virtualization introduces a layer of abstraction: instead of having to snoop out what resources are available and try to adapt a problem to use them, a user can describe a resource environment (virtual workspace) and expect it to be deployed on the grid. The mapping between the physical resources and the virtual workspace will be handled using virtual machines, virtual appliances, distributed storage facilities and network overlays (“virtual grids”). Virtualization covers both, data (flat files, databases etc.) and computing resources.
Virtualization Grid as workflow virtualization - the Grid computing services are used to execute and manage processes across multiple compute platforms. Data Grid as data virtualization - the management of shared collections independently of the remote storage systems where the data is stored. Name space virtualization - logical names for resources, users, files, and metadata that are independent of the name spaces used on the remote resource. Trust virtualization - the ability to manage authentication and authorization independently of the remote resource. Constraint virtualization - the ability to manage access controls independently of the remote resource. Network virtualization - the ability to manage transport in the presence of network devices.
Grid 2.0 Emerging
Software Services
Grid 2.0* Virtualized Compute, Storage, Network, Data Service Oriented
Grid 1.0
Virtualization
Compute Intensive Cycle Aggregation
Consolidation of Resources
Distributed, Parallel, stateless and transactional apps
The promise is that in Grid 2.0 the resources will be easier to define, test, install, transport and adjust on demand.
Tools and Standards 1. Globus: The Globus Toolkit designed by the Globus Alliance contains a set of software tools - Services, APIs and protocols - to facilitate constructions of Grids. It is the most widely used toolkit for building of Grids. It includes tools for, among other things, security, resource management and communication. The Globus Alliance also researches various issues related to Grid Computing, especially issues relating to the infrastructure of Grids. 2. Condor and Condor -G: Condor is a software tool for distributing computationally intensive jobs over Grids. It works by using spare CPU cycles on other computers. From the Condor product Condor-G has been created. Condor-G is an enhanced version of Condor which can be used to make Grids also. It uses Globus tools to provide security, resource discovery, and resource access.
Commercial Grid Products http://www.parabon.com http://www.ud.com
Computing and Communication
Communication
COMPUTING
Evolution: 1960-2010!
* HTC * Mainframe s
*
Minicomputers
* PCs
* Workstations
* Crays
* XEROX PARC worm
* PDAs
* MPPs
* Clusters
* P2P
* Computing as Utility * PC Clusters * Grids * e-Science * e-Business * W3C
* TCP/IP * Email
* Sputnik
1960
* Internet Era
* ARPANET
1970
* HTML
* Ethernet
1975
Centralised
1980
1990
Control
* Web Services * XML
* WWW Era
1985
* SocialNet
* Mosaic
1995
2000
Decentralised
2010
Supercomputing Power For Everyone ? In the past, supercomputing power has been available only to very few people - certain people in research institutions and some businesses. If the Grid is ever created, though, supercomputing power will be available to anyone who wishes to access it. This means that, amongst other things, anyone can do huge password searches or can try and crack public/private keys. With the creation of the Grid, these issues will have to be addressed either by somehow restricting users from being able to do such searches or by using even larger keys and passwords. There are no doubt many other social issues that will arise when everyone can have access to supercomputing power, and they will have to be addressed as well.
What’s holding us ? Organizational politics act very much like a barrier to implementing Grid computing. “Server-hugging” – organizations have a sense of ownership over the resources bought or allocated for their use. Perceived loss of control or access over resources. Lack of data security among departments. Fear of external data leaks. Reduced priority of projects - sometimes users believe that they need dedicated IT resources to complete their work accurately and efficiently. Risks associated with enterprise-wide deployment - how do different geographies and cultures come together to agree on global priorities, configurations, standards, and policies. In fact, how do you encourage people to let others run code on their machines ?
Gc as successful technology ! Once a technology matures, it does not necessarily take long time for it to become widespread. This same idea could be applied to Grid Computing - if some of the fundamental issues holding it back are addressed then computing power could truly become as widespread and easily accessible as electricity is now.
Thank you !
Clarifications ?