Subdividing Computation

  • November 2019
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Subdividing Computation as PDF for free.

More details

  • Words: 2,300
  • Pages: 4
ph 21.5 — The domain-decomposition paradigm: the Game of Life Introduction The biggest challenge in writing efficient parallel programs consists in subdividing computation between the processors. A careful analysis of the problem at hand is needed to identify which of its features lend themselves to be attacked in parallel, and which instead need to be tackled serially (i.e., on a single processor). Computational physics is often concerned with the modeling of extended physical systems. These systems might extend in actual physical space, or in some abstract parameter space; either way, their state is represented by the value of one of more variables at several different points in physical or abstract space. In the computer implementation of the models, the sets of variables that describe the systems would be represented by one or more arrays, whose indexes correspond to a structure of discretized locations in space. For instance, to study the behavior of electromagnetic waves in the vacuum, we would discretize space by creating a tridimensional lattice of points, and then use tridimensional arrays to store the values of the electric and magnetic fields on the lattice points. If every state variable interacts with every other state variable, it is generally hard to subdivide the computing tasks between a set of parallel processors. Fortunately, the physics involved in many extended physical systems is local. This means that to evolve the state of the system (at the point P ) from the time t to the time t + dt, we need only a knowledge of the state of the system at time t in the immediate surroundings of P . A natural way to address the parallelization of extended systems with local physics is the domain decomposition paradigm: we subdivide the physical system into several subsystems (one for each parallel processor), following the geometry of the physical system; each processor can then model the physics of a subsystem, storing only the physical quantities associated with the corresponding spatial subdomain, and exchanging information with the processors that are taking care of neighboring subdomains only when it becomes necessary. This divide and conquer approach is vital on Beowulf clusters, where it harder to share information between processors than in shared-memory supercomputers.

The Game of Life We are going to practice domain-decomposition techniques on a very famous computational game, made famous by Martin Gardner in a series of columns in Scientific American. This is the Game of Life, invented in 1970 by Princeton mathematician John Conway. Life is perhaps the simplest example of a cellular automaton: an n-dimensional array of cells evolving through discrete timesteps. At any given time, each cell is in one of a finite number of states, and the array evolves according to simple rules whereby the state of a cell at the next timestep depends only on the neighboring cells at the present timestep. The Game of Life is played on a two-dimensional array; each cell has only two states, alive and dead. The rules are simple: Birth. A dead cell with exactly three live neighbors becomes alive at the next timestep. Survival. A live cell with two or three live neighbors stays alive at the next timestep. Death. A live cell with more than three live neighbors (overcrowding) or fewer than two live neighbors (solitude) becomes dead at the next timestep. Neighbors of a given cell are defined as the eight cells immediately to its left, top, right, and bottom, and in the diagonal directions. For simplicity, we will assume that the array has periodic boundaries: that is, the right neighbor of the last cell on the right is the first cell on the left, and so on. The Game of Life can be considered as the idealized model of a bacterial colony in a Petri dish. The purpose of the game, if there is one, is to verify whether a certain initial configuration of cells will die out, live indefinitely, or even converge to a periodic state where the system traverses a finite number of states cyclically. The rules were chosen carefully to make the behavior of the system as interesting as possible, so in practice it is very hard to anticipate what will happen. In this assignment, we are going to write a parallel program that implements the Game of Life in a domain-decomposition paradigm, by subdividing the Life array between the parallel processors of our Beowulf cluster, octo.

1

Assignment 1. After reading the implementation notes below, write a serial (that is, one-processor without MPI) version of the Game of Life model. The program should take the dimension of the Life array as a command-line parameter. Decide what data structures you need, and then create separate functions to initialize the populations, evolve the Game through one timestep, and visualize the evolving population of cells (see the implementation notes below). You are encouraged to try an object-oriented approach, which should make it easier to re-use code that you have already tested when you get to the parallel version below. 2. Check the correct behavior of your Game of Life by trying out the toad, glider, and pulsar patterns from http://www.math.com/students/wonders/life/life.html. Experiment with other configurations. 3. Now parallelize the program. Divide the Life array into as many rectangular sections as you have processors. Since you are going to run the parallelized program using different numbers of processors, the subdivision should be done dynamically (at run time, according to the dimension of the grid input by the user, and to the number of processors returned by MPI_Comm_size or MPI::COMM_WORLD.Get_size()). Each processor should be running the same modified version of the serial program, and you will need to write special code to deal with the boundaries between regions evolved by different processors. One of the processors should collect the portions of the array from the other processors, and visualize it. 4. Try out the parallel program on a reasonably large array, and time the program using MPI Wtime() or MPI::Wtime() : you want to see how the execution time for (say) 10,000 Game of Life steps scales as you increase the number of processors. For this part of the assignment you should probably turn off visualization, since it will skew the timings.

Implementation Notes Visualization For visualization, I suggest that you use the g2_image function in the g2 library, which you are already familiar with. You can even use the g2 PNG output features to save interesting Game of Life configurations that you meet. Serial program In writing the serial program, strive to write code that can be re-used by the parallel program. First let’s describe the general strategy before getting into coding specifics: somewhere you will need a matrix of integers (let’s call it life) to represent the current state of the Life array (let’s say that 0 will represent dead, and 1 alive). You will need another matrix of integers (lets call it templife) where you will build the configuration of the array at the next timestep, by running through all the cells and counting their neighbors. Then you will copy templife into life and repeat. These arrays should be periodic in both the horizontal and vertical directions: that is, the element life(xsize, j) should actually refer to life(0, j). You will need to program in this periodicity yourself. In C++, a good way to proceed is in an object-oriented manner: You can define a class (perhaps call it LifeBlock) to represent the Life array. This class will take in its constructor the size of the Life array (say xsize and ysize), and it will internally store a one-dimensional array of ints (Why one-dimensional? Because then it is easier to pass it directly to the g2 visualization routines). Then you can redefine operator() (with two arguments) to return a reference to the value of a cell, and operator() would take into account the periodicity. Some advantages of this approach are that periodicity can be coded and tested in one place in the code, and that the rest of the code will not care whether the array is stored as a one-dimensional or two-dimensional array (That is, inside the operator() you will need to access the one-dimensional array with expressions like life[xsize∗j+i] and you will need to test if i ≥ xsize, j < 0, and so on to deal with periodicity, but almost everywhere else in the code you can simply say things like life(i,j) and your operator() will take care of the indexing and the periodicity). Another useful member

2

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

Processor 0

Processor 1

Processor 2

Figure 1: Ghost zones for a case with 3 processors. Each processor holds 8 rows of the Life array, but two of these rows are all ghost zones. On processor 1, row 6 consists of ghost zones that are filled by copying information from processor 0; notice that on processor zero, row 6 consists of all non-ghost cells. Likewise, on processor 1, row 13 is made up of ghost zones that are filled by communicating with processor 2. Rows 0 and 19 are ghost rows also, because of periodicity, and are similarly filled from neighboring processors.

function for a LifeBlock class would be void Visualize(const int Gid) to visualize the array using g2; this is the only function that needs to use the one-dimensonal array directly. Finally you need a function void TakeLifeStep(const LifeBlock& old, LifeBlock& new) that takes a step according to the Game of Life algorithm. This last function does not need to be a member function of LifeBlock. Parallel program First the general strategy (see figure 1): divide the rectangular array into strips that take up all the horizontal extent of the matrix, and that have equal extents in the vertical direction. Each processor will take a strip, so it will store only a matrix of size xsize × (ysize/n+2), where n is the number of processors. Why the +2, which gives two extra rows per processor? The cells in the extra rows, called ghost cells or ghost zones, store information obtained from rows that sit just outside the domain of each processor. They are used because the cells on the boundary of each processor need to know whether all their nearest neighbors are dead or alive. Notice from the figure that the arrays for each processor overlap in such a way so that each ghost zone on a given processor corresponds to a non-ghost zone on its neighbor. The usefulness of ghost zones can be seen by the simplicity of the parallel algorithm for taking a Life step, which we state here: 1. Take a Life step on each processor as you did in the serial algorithm, completely ignoring the other processors. 2. Fill all the ghost zones by copying appropriate data from neighboring processors. And that’s it! You should convince yourself that by having 2 ghost zones for each processor (one on the top and one on the bottom, as shown in figure 1), this algorithm should produce exactly the same result as you would get if you had a larger array on a single processor (for example, the setup in figure 1 should give the same result with this parallel algorithm as you would get with the serial code and a single array containing rows 1 through 18). The exchange of information in step 2 of the algorithm should be done using MPI messaging. So each processor will need a knowledge of where it sits in the Life array with respect to the other processors, and which processors are its top and bottom neighbors. When you implement the messaging, you should be

3

careful to avoid deadlock : for instance, two processors might try to send each other messages, and execute MPI::Send at the same time. But MPI::Send will not return before a corresponding MPI::Recv is posted, which in this example can never happen before the MPI::Send is completed. It would be better to use the nonblocking calls MPI::Isend and MPI::Irecv to post the messages, and to use MPI::Request::Waitall() to wait until all messages are sent and received. The calling sequence of these functions in C++ is MPI::Request MPI::COMM_WORLD.Isend(void* data,int count, const MPI::Datatype& datatype,int dest,int tag); MPI::Request MPI::COMM_WORLD.Irecv(void* data,int count, const MPI::Datatype& datatype,int source,int tag); void MPI::Request::Waitall(int count,MPI::Request *array_of_requests);

Notice that if you chose an object-oriented approach to the serial algorithm, then to implement the parallel algorithm you can re-use your LifeBlock class mostly (or even completely) unchanged! For instance, one idea is to create a new class ParallelLifeBlock that internally holds a LifeBlock on each processor and is aware of parallelism, and you can create a new void TakeLifeStep(const ParallelLifeBlock& old, ParallelLifeBlock& that fills ghost zones and calls the serial TakeLifeStep function. You will also need a routine to fill the ParallelLifeBlock with initial data (including communication between processors), and in this routine you must remember to synchronize the ghost zones (i.e. do step 2 of the algorithm) before you can start taking steps. For visualization, ParallelLifeBlock will need to communicate the state of the array from all processors to processor zero, which will assemble them together and call g2_image. This can be done, for example, by having ParallelLifeBlock put the assembled data into a larger LifeBlock (one that holds all the data for the full Life array) on processor zero and then calls the Visualize function of this larger LifeBlock. (Implementing complicated things in terms of simpler things that already work is usually a good idea). Note that the visualization step is going to be very expensive (perhaps almost to the point of offsetting the advantage of having parallelized the code) because it needs to communicate all data from all processors, so you may want to do it only every few timesteps.

4

Related Documents

Subdividing Computation
November 2019 14
Computation
November 2019 22
Math Computation
June 2020 3
Computation Altavista
November 2019 28
It Computation
October 2019 23
Parallel Computation
May 2020 14