High Performance Computing

  • Uploaded by: Shrey Thakur
  • 0
  • 0
  • April 2020
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View High Performance Computing as PDF for free.

More details

  • Words: 2,579
  • Pages: 61
HIGH PERFORMANCE COMPUTING

Who needs HPC? What defines High Performance Computing? 

“We use HPC to solve problems that could not be solved in a reasonable amount of time using conventional mainframe machines.”

HISTORY In 1994, Thomas Sterling and Don Becker, working at the center of excellence in space data and information sciences (CESDIS) under the sponsorship of the earth and space sciences project, built a cluster computer called “Beowulf” which consisted of 16 DX4 processors connected by channel bonded 10Mbps Ethernet. The idea was to use commodity off the shelf base components and build a cluster system to address a particular computational requirement for the ESS community. The Beowulf project was an instant success, demonstrating the concept of using the commodity cluster as an alternative and attractive choice for HPC. Researchers within the HPC community now referred to such systems as HPC clusters.

Classification of computers Mainframe computers Minicomputers Microcomputers Supercomputers It was only in the beginning of 2000 that the supercomputing arena moves into the gigaflops region; earlier it was in megaflops. This increased the speed to a reasonable level.

Clustering of Computers for Collective Computing: Trends

1960

1990

?

1995+ 2000

WHAT IS A CLUSTER? A cluster is a type of parallel or distributed processing system, which consists of a collection of interconnected standalone/complete computers cooperatively working together as a single, integrated computing resource.

TWO ERAS OF COMPUTING

WHY CLUSTERS ? As the data increased immensely, the demand for fast and efficient processing gave rise to cluster technology. Can be easily used for nuclear research, genetic engineering, oil and gas exploration and also for web hosting, E-commerce etc.  scalability, availability, performance, reliable-high performance-massive storage and database support. Solve problems due to hardware failure-blown CPU, bad memory, or loss of an entire server. Continues providing resources during planned out stages that may cause downtime for users. Allows resources to be moved, or failed over, to one server while the other is brought down for a configuration change or maintenance. In SMP machines, more than 128 CPUs can’t be used, so performance is limited.

Computing Elements

Applications

Programming Paradigms Threads Interface Operating System

Microkernel Multi-Processor Computing System P

P

P

P Processor

P

Thread

P

..

P

Process

Hardware

Cluster Computer Architecture

Cluster components Nodes:  

Computing node Master node

The types of CPUs used in the nodes frequently are Intel and AMD. In Intel we use Xeon and Itanium processors and in AMD we use Opetron.

In a cluster, each node can be a single or multiprocessor computer, such as a PC, workstation or SMP server, equipped with its own memory, I\O devices & operating system. The nodes are interconnected by LAN using one of the following technologies:ethernet,fast Ethernet, gigabit ethernet,myrinet,infiband communication fabric.

STORAGE There are 3 types of storage techniques: 

DAS: direct attach storage. This is the elementary form wherein the hard disk is directly plugged in to the CPU.



SAN: storage area network.

NAS: Network attach storage

Networking in Clusters Gigabit Ethernet: It is a transmission technology based on the Ethernet frame format and protocol used in the local area networks, provides a data rate of 1 billion bits/sec (1gigabit). Gigabit Ethernet is defined in the IEEE 802.3 standards. After the 10 and 100Mbps cards, a newer standard, 10Gb Ethernet is also becoming available.

Infiniband: It is an architecture and specification for data flow between processors and I/O devices that promise data bandwidth and almost unlimited expandability in tomorrows computer systems. It is expected to gradually replace the existing peripheral component interconnect (PCI) shared-bus approach used in most of today’s PC’s and servers. Offering up to 2.5GB/s and supporting up to 64000 addressable devices it has increased reliability, better sharing between clustered processors and built in security.

Myrinet: It is a cost effective, high performance, packet communication and switching technology that is widely used to interconnect clusters of workstations, PC’s, servers, blade servers, or single board computers. Myrinet characteristics: 



Flow control, error control, and heartbeat continuity monitoring on every link. Low latency, cut through switches with monitoring of high availability applications.

TYPES OF CLUSTERS FAILOVER CLUSTER: if the active node goes down, the stand-by node takes over, allowing a mission-critical system to continue functioning. LOAD-BALANCING CLUSTER: all workload come through one or more load-balancing front end, which then distribute it to a collection of back-end servers. HIGH PERFORMANCE CLUSTER: provides increased performance by splitting a computational task across many different nodes in the cluster and are most commonly used in scientific computing .

What Is Cluster Middleware An interface between use applications and cluster hardware and OS platform. Middleware packages support each other at the management, programming, and implementation levels. Middleware Layers:  SSI Layer  Availability Layer: It enables the cluster services of Checkpointing, Automatic Failover, recovery from failure, fault-tolerant operating among all cluster nodes.

Cluster Computing - Commercial Software Load Leveler - IBM Corp., USA LSF (Load Sharing Facility) - Platform Computing, Canada NQE (Network Queuing Environment) - Craysoft Corp., USA Open Frame - Centre for Development of Advanced Computing, India RWPC (Real World Computing Partnership), Japan UnixWare (SCO-Santa Cruz Operations,), USA Solaris-MC (Sun Microsystems), USA Cluster Tools (A number for free HPC clusters tools from Sun)

Computational Power Improvement

C.P.I.

Multiprocessor

Uniprocessor

1

2. . . .

No. of Processors

Characteristics

MPP

SMP

Cluster

No. of nodes

100-1000

10-100

Unbounded

Job scheduling

Single run queue at host

Single run queue mostly

Multiple queues but coordinated.

SSI support

Partially

Always

desired

Address space

Multiple

Single

Multiple/single

Internode security

Unnecessary

Unnecessary Required if exposed

Ownership

One organization

One organization

One or more organizations

Network protocol

Non standard

Non standard

Standard and lightweight

Example Clusters: Berkeley NOW 100 Sun UltraSparcs  200 disks Myrinet  160 MB/s Fast comm.  AM, MPI, ... Ether/ATM switched external net Global OS Self Config

HA Cluster: Server Cluster with "Heartbeat" Connection

Clusters of Clusters (HyperClusters) Cluster 1 Scheduler

Master Daemon

LAN/WAN Submit Graphical Control

Execution Daemon

Cluster 3 Scheduler

Clients

Master Daemon

Cluster 2 Submit Graphical Control

Scheduler

Master Daemon

Clients

Submit Graphical Control

Clients

Execution Daemon

Execution Daemon

MPI (Message Passing Interface) A standard portable message-passing library definition. Developed in 1993 by a group of parallel computer vendors, software writers, and application scientists. Target platform is a distributed memory system. All inter-task communication is by message passing Portable (once coded, it can run on virtually all HPC platforms including clusters!) Available for both Fortran and C programs. Available on a wide variety of parallel machines. All parallelism is explicit: the programmer is responsible for parallelism the program and implementing the MPI constructs.

PVM (PARALLEL VIRTUAL MACHINE) Originally developed in 1989 as a research tool to explore heterogeneous network computing by Oak Ridge National Laboratory. Now available as a public domain software package. Enables a collection of different computer systems to be viewed as a single parallel machine. All communication is accomplished by message passing. PVM enables users to exploit their existing computer hardware to solve much larger problems at minimal additional cost. The software is very portable. The source,available free through netlib, has been compiled on everything from laptops to CRAYs.

Cluster Computing - Research Projects Beowulf (CalTech and NASA) - USA CCS (Computing Centre Software) - Paderborn, Germany DQS (Distributed Queuing System) - Florida State University, US. HPVM -(High Performance Virtual Machine),UIUC&now UCSB,US MOSIX - Hebrew University of Jerusalem, Israel MPI (MPI Forum, MPICH is one of the popular implementations) NOW (Network of Workstations) - Berkeley, USA NIMROD - Monash University, Australia NetSolve - University of Tennessee, USA PVM - Oak Ridge National Lab./UTK/Emory, USA

PARAM PADMA PARAM Padma is C-DAC's next generation high performance scalable computing cluster, currently with a peak computing power of One Teraflop. The hardware environment is powered by the Compute Nodes based on the state-ofthe-art Power4 RISC processors. Technology. These nodes are connected through a primary high performance System Area Network, PARAMNet-II, designed and developed by CDAC and a Gigabit Ethernet as a backup network.

ONGC CLUSTERS ONGC implements two LINUX cluster machines : One is a 272 nodes dual core computing system with each node equivalent to two CPU’s. The master node has 12 nodes dual CPU and a 32terabyte SAN storage. The second system has 48 nodes i.e 96 CPUs code computing nodes and the master node has 4 nodes and 20 terabyte SAN storage.

APPLICATIONS OF CLUSTERTECHNOLOGY Numerous Scientific & Engineering Apps. Parametric Simulations Business Applications   

E-commerce Applications (Amazon. COM, eBay.com ….) Database Applications (Oracle on cluster) Decision Support Systems

Internet Applications     

Web serving / surfing Info wares (yahoo.com, AOL.com) ASPs (application service providers) Email, eChat, ePhone, eBanking, etc. Computing Portals

Mission Critical Applications 

command control systems, banks, nuclear reactor control and handling life threatening situations.

WHAT IS GRID COMPUTING? IT IS A TYPE OF PARALLEL AND DISTRIBUTED SYSTEM THAT ENABLES THE SHARING, SELECTION AND AGGREGATION OF GEOGRAPHICALLY DISTRIBUTED AUTONOMOUS RESOURCES, DYNAMICALLY AT RUNTIME DEPENDING ON THEIR AVAILABILITY, CAPABILITY, PERFORMANCE AND COST.

IT F OLLOWS SE RVICE ORIENTED ARCHITECTURE(SOA) AND PROVIDES HARDWARE AND SOFTWARE SERVICES FOR SECURE AND UNIFORM ACCESS TO HETEROGENEOUS RESOURCES, ENABLES FORMATION AND MANAGEMENT OF VIRTUAL ORGANIZATIONS

/

HISTORY OF GRID COMPUTING The term Grid computing originated in the early 1990s as a metaphor for making computer power as easy to access as an electric power grid in Ian Foster and Carl Kesselmans seminal work, "The Grid: Blueprint for a new computing infrastructure". They lead the effort to create the globus toolkit,which remains the defacto standard for grid computing.

FATHER OF THE GRID “IAN FOSTER” With Steven Tuecke and Carl Kesselman he began the Globus Project, a software system for international scientific collaboration. They envisioned Globus software that would link sites into a “virtual organization,” with standardized methods to authenticate identities, authorize specific activities, and control data movement.

Towards Grid Computing….

GRID COMPUTING IS NEEDED FOR… Harnessing idle CPU cycles from desktop computers in the network (e.g. SETI@home,Alchemi) Sharing of data resources across multiple organizations (e.g. SRB) Replication and collaborative analysis of distributed datasets (e.g. Globus, Gridbus) Supporting market based allocation and management of grid resources and turn grid services into utilities (e.g. Gridbus).

TYPES OF GRID Computational grids: focuses primarily on computationally intensive operations. Data grids: controlled sharing and management of large amount of distributed data. Equipment grid: have a primary piece of equipment where the surrounding grid is used to control the equipment remotely and to analyze the data produced.

GRID MIDDLEWARE Software tools and services providing the capability to link computing capability and data sources in order to support distributed analysis and collaboration are collectively known as grid middleware. E.g. 1. Advanced resource connector 2. Globus toolkit. 3. PVM (Parallel virtual machine). 4. MPI (Message passing interface). 5. UNICORE. 6. Ice grid.

GRID ARCHITECTURE Consists of 4 layers: 1. Fabric. 2. Core Middleware. 3. User-level middleware. 4. Applications and portal layers.

GRID COMPONENTS Applications and Portals

Scientific

Engineering

… Web enabled Apps

Prob. Solving Env.

Collaboration

Development Environments and Tools

Languages

Libraries

Debuggers

Monitoring

Resource Brokers

…Web tools

Distributed Resources Coupling Services

Comm.

Sign on & Security

Information

Process

Data Access

…QoS

Grid Apps.

Userlevel middlew are

Core Middleware

Local Resource Managers

Operating Systems

Queuing Systems

Libraries & App Kernels

… TCP/IP & UDP

Networked Resources across Organisations

Computers

Clusters

Storage Systems

Data Sources

Scientific Instruments

Grid Fabric

GRID FABRIC This layer consists of distributed resources such as computers, networks, storage devices and scientific instruments. The computational resources represent multiple architectures such as clusters, supercomputers, servers and ordinary PCs which run a variety of operating systems.

CORE GRID Middleware offers services such as remote process management, co-allocation of resources, storage access, information registration and discovery, security and aspects of Quality Of Services such as resources reservation and trading. These services abstract the complexity and heterogeneity of the fabric level by providing a consistent method for accessing distributed resources.

USER-LEVEL GRID Middleware utilizes the interfaces provided by the low level middleware to provide higher level abstractions and services. These include application development environments, programming tools and resource brokers for managing resources and scheduling application tasks for execution on global resources.

GRID APPLICATIONS AND PORTALS These are typically developed using grid enabled programming environments and interfaces and brokering and scheduling services provided by user levelmiddleware. Grid portals offer web enabled application services, where users can submit and collect results for their jobs on remote resources through the web.

OPERATIONAL FLOW 1. The users compose their application as a distributed applications using visual application development tools. 2. The users specify their analysis and quality of services requirement and submit them to the grid resource broker. 3. The grid resource broker performs resource discovery and their characteristics using grid information services. 4. The broker identifies resource service prices by querying the grid market directory

5. The broker identifies the list of data resources or replicas and selects the optimal ones. 6. The broker also identifies the list of computational resources that provides the required application services. 7. The broker ensures that the user has necessary credit or authorized share to utilize resources. 8. The broker scheduler maps and deploys data analysis jobs on resources that meet user quality- of-service requirements.

9. The broker agent on a resource executes the job and returns results. 10. The broker collates the results and passes to the user. 11. The metering system charges the user by passing the resource usage information to the accounting system.

Cycle Stealing Typically, there are three types of owners, who use their workstations mostly for: 1. Sending and receiving email and preparing documents. 2. Software development - edit, compile, debug and test cycle. 3. Running compute-intensive applications.

Cycle Stealing

Cluster computing aims to steal spare cycles from (1) and (2) to provide resources for (3). However, this requires overcoming the ownership hurdle - people are very protective of their workstations. Usually requires organisational mandate that computers are to be used in this way. Stealing cycles outside standard work hours (e.g. overnight) is easy, stealing idle cycles during work hours without impacting interactive use (both CPU and memory) is much harder.

Cycle Stealing Usually a workstation will be owned by an individual, group, department, or organisation - they are dedicated to the exclusive use by the owners. This brings problems when attempting to form a cluster of workstations for running distributed applications.

INTERNATIONAL GRID PROJECTS GARUDA (Indian) D-grid (German) Malaysia national grid computing Singapore national grid computing project Thailand national grid computing project CERN data grid (Europe) PUBLIC FORUMS  Computing Portals  Grid Forum  European Grid Forum  IEEE TFCC!  GRID’2000

CHALLENGES IN GRID Heterogeneity – that results because of the vast range of technologies, both software and hardware, encompassed by the grid. Handling of grid resources – that spread across political and geographical boundaries.

GRID EVOLUTION FROM RESEARCH TO INDUSTRY

• In April 2004,IT venders-EMC,HP, Intel

,Network Appliance, Oracle and Sun have taken step towards developing enterprise gird solutions by establishing the EGA.

• The Globus consortium has been founded

in January 2005 by HP,IBM,Intel,Sun and Univa to support business acceptance and implementation of the Globus toolkit

FUTURE TRENDS IN GRID COMPUTING By providing scalable, secure, high performance mechanism for discovering and negotiating access to remote resources, the grid promises to make it possible for scientific collaborations to share resources and geographically distributed groups to work together in ways that were previously impossible.

By:

SHREYA THAKUR Laxmi Devi Institute of Engineering & Technology

4 Year, ECE. th

Related Documents


More Documents from ""

Hyper Sonic Sound
April 2020 6
Machines-1 Modified2.pdf
December 2019 7
Tasks.docx
December 2019 8
Autocad Notes.docx
December 2019 6