Introduction To Parallel Processing

  • April 2020
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Introduction To Parallel Processing as PDF for free.

More details

  • Words: 1,942
  • Pages: 6
CS-421 Parallel Processing Handout_1

BE(CIS) Batch 2004-05

Parallel Processing 1. Definition Parallel Processing is the use of a collection of integrated and tightly coupled processing elements or processors that are cooperating and communicating on a single task to speed up its solution.

2. Motivation •

Higher Speed, or Solving Problems Faster ƒ

This is important when applications have hard or soft deadlines (called real time systems) o

For example, we have at most few hours to do 24-hour weather forecasting or to produce timely tornado warning



Higher Throughput, or Solving More Instances of Given Problems ƒ



E.g. Transaction processing for banks and airlines

Higher Computational Power, or Solving Larger Problems ƒ

Generate more detailed, accurate and longer simulation e.g. 5 day weather forecasting

3. Large-Scale Problems These are the problems that require enormous computation power (usually expressed in OPS, MFLOPS, etc). They are thus regarded as High Performance Computing (HPC) applications. •

Examples ƒ

Weather Forecasting o

Assume that the target area is roughly 3,000 miles by 3,000 miles. Over this area we lay a rectangular grid and predict weather conditions only at the grid intersection points.

o

Since we only forecast the weather at intersection points, we should not use too coarse a grid as our predictions will be poor for locations far from these points. Conversely, we should not use too fine a grid, as that will create too much work.

o

Let’s use a grid spacing of 0.25 mile. If we assume that the atmosphere extends to 20 miles, then we will end up with a three-dimensional grid of size 12,000 x 12,000 x 80, a total of 1.15 x 1010 grid intersection points.

o

At each point assume that we have initial values for the following six pieces of meteorological data, obtained from a weather satellite:

Page - 1 - of 6

CS-421 Parallel Processing Handout_1

BE(CIS) Batch 2004-05

• The x, y, and z coordinates of the wind speed • Temperature • Humidity • Barometric pressure o

At the beginning of the simulation we have 6 x (1.15 x 1010 ) or 6.9 x 1010 pieces of data.

o

To carry out the simulation, we divide time into discrete units of size ∆T and simulate the behavior of the system only at discrete time intervals (T + k * ∆T) for k = 0, 1, 2, 3,....

o

Assume it is now time T. We move the simulation ahead to time T + ∆T by evaluating formulas which determine a new value for each of the six weather variables at time (T + ∆T) at each grid intersection point (x, y, z) based on the current value of the variable at this point and the current value of the weather variable at its six immediate neighbors in the grid.

o

Suppose it takes 15 floating-point operations for each weather variable. Therefore, with six weather variables we need a total of 15 x 6 = 90 floatingpoint operations to move from time T to time T + ∆T for each grid point (x, y, z).

o

This produces a total of 1.15 x 1010 intersection points x 90 floating-point operations/intersection point/time step = 1.04 x 1012

floating-point

operations/time step o

The final issue we must address is the size of ∆T. As with the selection of a grid separation size, this value will be a compromise between two unacceptable extremes. Too large a time step (many hours) will produce a very poor approximation as we will lose all information about the many weather changes that may occur in the middle of this long interval. Too small a time step (a few seconds) will produce accurate results but will generate too much work, possibly preventing us from getting answers in a reasonable amount of time.

o

Assume that we choose a time step of 15 minutes. If we are doing a 24 hour weather forecast, then we will perform the updating procedure described above 96 times, the number of 15 minute intervals in 24 hours. This means that the total number of computations that must be carried out is 1.04 x 1012

Page - 2 - of 6

CS-421 Parallel Processing Handout_1

BE(CIS) Batch 2004-05

floating-point operations/time step x 96 time steps = 1014 floating-point operations o

If we are using a workstation with a computational power of 1 MFLOPS, then it will take about 1014 / 108 = 106 seconds, about 12 days, to complete the task. Spending 12 days to determine tomorrow’s weather is not a very useful idea! In order to complete the task in a reasonable amount of time, we need a lot more computational power.

ƒ

Computational Science In the past there were only two ways to do scientific investigation--theory and experimentation. a) You could theoretically demonstrate the truth of a new idea by designing a formal mathematical proof that begins from valid axioms and assumptions. Each step in the proof is justified by some universally accepted law, leading to the proof of correctness of your new theorem. This technique is widely used in such disciplines as mathematics, logic, and theoretical physics. b) A second approach was to go into the laboratory or the field and do experimentation.

You design an experiment whose outcome can prove or

disprove the validity of your hypothesis. You observe the experiment, collect data, analyze it, and use that analysis to prove or disprove your new idea. c) Now, however, there is a third way to do scientific investigation. You can study an idea computationally. You build a computational model of the system you wish to study, run the program, and observe the output of the model. If your model is an accurate representation of the real system, then the output of the program will accurately duplicate the behavior of the real system, and conclusions drawn from observations of the model will be perfectly valid. This approach has a number of important advantages over both theoretical and experimental work. i. It can be much safer to study a computational model than the real system. Think about testing a new design for a nuclear reactor. ii. It can also be cheaper, since a model can be modified hundreds or thousands of times for little cost--usually just changing a few numbers and rerunning the program. Changes to an actual object, such as the molecular structure of a new drug, can be enormously expensive.

Page - 3 - of 6

CS-421 Parallel Processing Handout_1

BE(CIS) Batch 2004-05

iii. A computer model can change the time scale to make the system more convenient to observe. This is particularly important for systems that change too slowly (galactic models of the formation of the universe) or too quickly (elementary particle decay) to study in real life. iv. Finally, we can model objects that do not yet exist in order to predict their future behavior. For example, we could “walk through” a building that has not yet been built or study the effect of increased concentrations of CO2 on the climate of the 21st and 22nd century. Because of these advantages, researchers are using computational modeling in such diverse areas as economics, physics, architecture, chemistry, and medicine, as well as building programs to simulate the behavior of everything from automobiles, airplanes, and rockets, to planets, oceans, cities, and the human body.

Computational modeling is becoming an important and widely used

scientific paradigm. However, there is no “free lunch”, and there is a serious issue that must be addressed when using simulation models--the amount of computation required to execute the modeling program.

It can be truly enormous, well beyond the

capabilities of all but the largest parallel computers. •

Grand Challenge Problems A fundamental problem in science or engineering with broad economic and scientific impact whose solution would be advanced by new developments in high performance computing and communications technology. The U.S. government formed a High Performance Computing and Communications (HPCC) Initiative in 1991 to identify and plan solutions for such problems. E.g. •

studying the formation and evolution of galaxies



designing new drugs and studying their physiological properties



studying atmospheric pollution and its effect on temperature



predicting long term global climate and ocean temperature changes



designing new manufacturing materials with specific and well defined properties

4. Distributed Processing This is the use of a collection of independent workstations acting as a single logical system and cooperating and communicating on a single task to speed up its solution.

Page - 4 - of 6

CS-421 Parallel Processing Handout_1

BE(CIS) Batch 2004-05

A distributed system can be implemented as either a Network of Workstations (NOW) or a Cluster of Workstations (COW). The idea is to use the idle time of the workstations. It has been successfully demonstrated that distributed processing can yield far superior performance-to-cost ratio (PCR) than supercomputers. E.g. Professors of Emory & Purdue universities achieved 3 GIPS/1 MUSD (Performance-toCost Ratio i.e. PCR) > 150 times on CRAY Y-MP- winners of 1992 Gordon Bell Prize Following important technological breakthroughs have fueled the development in the field of distributed processing: •

Faster microprocessors



Faster network connections



New operating system designs

5. Limitations to Uniprocessor Improvement It’s becoming gradually difficult to speedup uniprocessor due to the following fundamental constraints: a. Speed of Light For each instruction executed, the processor must fetch the instruction, and move it to the IR register inside the processor. The time to complete this operation is bounded by the speed of propagation of electromagnetic signals--3 x 108 meters per second. (Actually, that is the speed of light in vacuum; the speed of light through silicon is less.) If the distance between the processor and the memory unit is 30 cm, it takes about 1 billionth of a second for an instruction to travel from memory to the processor. Since we must fetch each instruction before we can execute it, this time delay places an upper bound on processor speed of 1 billion instructions/second, not even counting the time needed to actually decode and execute the instructions. The only thing we can do about this delay is reduce the distance between memory and processor. This will work for a while, but eventually we will bump into manufacturing problems associated with fabricating very tiny devices and placing them accurately into very small spaces. b. Limits on Miniaturization After all, a transistor needs some space on the chip and cannot disappear. c. Power dissipation issues due to increased clock rate •

Increased clock rate results in staggering power consumption density on chip



Clock rates are now stagnating to counter increased level of power consumption

Page - 5 - of 6

CS-421 Parallel Processing Handout_1



BE(CIS) Batch 2004-05

Stagnating clock rates are now being compensated by multiple processor cores on the same chip i.e. multi-core architectures



Consequently, uniprocessor is now becoming to disappear even from the desktops thus making it imperative for programmers to learn parallel programming techniques to exploit the hardware parallelism available in state-of-the-art machines.

d. Von-Neumann Bottleneck •

The speed-disparity between processor and memory is growing with the passage of time, causing huge performance bottlenecks.



A parallel system (e.g. a COW) overcomes this shortcoming by providing more and more aggregate memory and cache capacity as well as boosting the memory bandwidth required by HPC applications.



Some of the fastest growing applications of parallel computing utilize not their raw computational speed, rather their ability to pump data to memory and disk faster.

6. Economic Impact Businesses are investing more and more money in parallel and distributed solutions as they have realized the competitive potential of these technologies.

******

Page - 6 - of 6

Related Documents