Version 1.9 Build 1556

News	FAQ
Search	Home

Next: Categorization of User Specifications into Embarrassingly Parallel, Fine-Grain, Up: NOTE 203 - PARALLELIZATION OF AIPS++ Previous: Introduction

Subsections

Types of Parallelization

Parallelization comes in a variety of ``grains.'' By ``grain,'' we mean that scale of the problem that is parallelized. One extreme is illustrated by spectral-line processing. In this case, individual channels are totally independent, thus, each can be sent to a separate processor for computation. Because there is no communication between processors, communication overhead is negligible and the speed-up is nearly linear with the number of processors. This type of parallelization is easy for the programmer to implement, because the Glish interface of AIPS++ can control the spawning of processes in a parallel way. No low-level optimization is required. Problems in this category are called ``embarrassingly parallel'' (EP).

However, many problems are not independent at a large scale and may have complex dependencies that prevent simple Glish-level parallelism. For many such problems, it is possible to identify low-level routines, such as FFT's, that carry out the majority of the computations. In these cases, the computationally intensive parts of the code can be designed to call extrinsic Fortran subroutines or libraries. These Fortran routines can be written to take advantage of parallel architectures at a low level. This type of parallelization is called ``fine-grain parallelization.''

A third category is particularly difficult, in which the large-scale problem is not separable into independent processes and computational time is not spent in parallelizable low-level functions.

Embarrassingly Parallel Problems

For these problems, existing C++ code will be used whenever possible. A high level Glish wrapper will manage the execution of each code on the subsections of data. This may include some estimation of execution time and overhead for distribution of data and recombination. Ideally, this estimation should also take into account the size of the data relative to the local memory available on each processor. When estimates suggest negligible speed up, the user should be queried before execution. The long-term goal should include execution on heterogeneous machines, but this can be added after parallelization of a single machine is completed. Because the system calls to execute a process on a specific processor may be machine dependent, some of the Glish internals may need to be slightly changed for machine-specific commands. Also, the commands that are sent from Glish should turn off the fine-grain parallelization (by setting the number of threads to zero), so that fine-grain and embarrassingly parallel executions are not competing with each other. For EP problems, the large speed-up is achieved at a relatively low programmer cost.

Fine-Grain Parallel Problems

The High Performance Fortran (HPF) standard is fairly concrete now. The standard is an extension of Fortran 90 and includes a preprocessor that rewrites the input code to execute in a Single-Instruction-Multiple-Data (SIMD) parallelism. Thus, after parallelism, the same instruction is sent to all processors and the compiler divides the data across the processors. This is effectively parallelizing Fortran DO loops. Any Fortran 90 code can be compiled by an HPF compiler and conversely HPF code should compile under Fortran 90 compilers. The HPF compiler attempts to determine the independent loops and distribute data in an optimum way. HPF includes compiler directives that can force the compiler to recognize loops as independent and can force particular distribution of data across processors to mitigate slow transfers from global to local memory. Also included are pre-tuned routines for some common applications, such as FFT's. Several vendors currently have HPF compilers. The Portland Group offers compilers for a variety of platforms, from Silicon Graphics to Sun Sparcstations. Many hardware vendors are developing their own HPF compilers, such as DEC and IBM.

Although the fine grain parallelism requires a substantial initial investment of programmer time, two aspects mitigate this cost. First of all, some optimized Fortran libraries exist in the HPF libraries. Thus, parallelism is achieved with almost no additional programming. Secondly, the development of parallel libraries is a one-time project. Once the libraries exist, new programs can take advantage of the parallelism by writing them in such a use parallelized library functions whenever possible.

Combination Problems

There is a class of problems that are a combination of fine-grain and EP problems. These problems have a large amount of data dependency that prevents significant speed up with increased number of processors. If possible, these algorithms should be rewritten in a way to emphasize fine-grain or embarrassing parallelism. One purpose of this document is for programmers to keep parallel considerations in mind from the beginning, such that extensive rewriting for parallelism is unnecessary. In cases where such rewriting is not possible because of the organization of the algorithm, other solutions could be explored, such as exchange to a powerful, single-processor vector machine (e.g. Cray YMP).

Next: Categorization of User Specifications into Embarrassingly Parallel, Fine-Grain, Up: NOTE 203 - PARALLELIZATION OF AIPS++ Previous: Introduction Contents