Version 1.9 Build 1556

News	FAQ
Search	Home

Next: AIPS++ Requirements and a Tentative Time Table Up: NOTE 203 - PARALLELIZATION OF AIPS++ Previous: Types of Parallelization

Subsections

Categorization of User Specifications into Embarrassingly Parallel, Fine-Grain, or Combination Problems

The specific user specifications have been categorized into (1) embarrassingly parallel, (2) fine-grain, and (3) combination problems. Within the EP and fine-grain cases, the problems are subdivided into other categories. Within each category, the problems are presented in the order that we propose to address them. Generally, the order is from easiest to implement to most difficult. Starting with straight forward problems gives us the opportunity to gain expertise that will be required for more complex problems. Some descriptions end in a star ( $\star$ ); these are of particular interest because the functionality does not exist in any general data reduction package. Work on these problems should be given a higher priority in order that the parallel effort of AIPS++ coincide with the general theme of AIPS++, in which new functionality is given presidence over repetition of existing functionality.

Embarrassingly Parallel Problems

The embarrassingly parallel problems are divided into (1) general, (2) derived, and (3) CDCI cases. General cases involve problems that recur in astronomical data reduction, such as calibration. Derived cases are specific cases that are made up of one or more components of general problems. Also identified are a third category of Calculate-Distribute-Collect-Iterate (CDCI) problems. CDCI problems are a specific form of the derived case where the problem can be represented as a collection of general EP problems with a significant fraction of time spent in the collection and comparison of the distributed results in order to direct additional iterations. CDCI problems are among the most computationally intensive.

General Problems

General embarrassingly parallel problems will require the distribution of existing C++ code across multiple processors. Once these routines have been constructed, they can be used in other applications (see § 3.1.2).

Spectral line image construction and deconvolution. Spectral line processing, including imaging and deconvolution can be carried out easily in an embarrassingly parallel way. This is a common case and is easily implemented. The details of this implementation of an EP case will be important as a template for more complicated problems. This should be the first implementation.

1.
Spectral line cube formation. This is the simplest case, where independent spectral-line channels are sent to separate processors for imaging.

2.
Spectral line deconvolution. Independent spectral-line channels are sent to separate processors for deconvolution. If both imaging and deconvolution are requested by the user, the two functions should be pipelined together and sent to individual processors in one step.
Linear mosaic algorithm with linear deconvolution (MOSLIN in SDE) together with linear combination of pre-deconvolved images, weighting determined by primary beam. Separate fields are independent and can be sent to different processors.
Antenna-based determination of calibration and self-calibration. This problem can be separated into independent time slices and sent to individual processors.
Antenna and baseline-based fringe fitting for a range of spectral channels and fringe rates (normally only for VLBI data). This is a very computationally intensive problem that can be separated in time for parallel processing. $\star$
Image construction from calibrated total power data (frequency-switched, beam-switched, multi-beam, focal plane array) sequences from single antennas and phased arrays, with and without spectrometers. This can be divided into separate time ranges and sent to separate processors.
Calibration for non-isoplanicity using special extensions of self-calibration. This is the general case, which includes the wide field imaging, with clusters of fields. Fields could be constructed by shifting the phases of the visibility data then sent to individual processors.
Parameter-driven automated flagging for large data sets. This could be done by slicing in time. However, flagging operations are usually not computationally intensive, thus the benefit of this is not expected to be great. Low priority.

Derived Applications of Embarrassing Parallelism

Once embarrassingly parallel general tasks, such as calibration, are parallelized, programs to address many problems (``derived applications'') can be parallelized by calling the appropriate subtasks. The order of these tasks is not as important as the general case, because these will be addressed after the general cases that they call or emulate have been written.

Imaging of spectral line data sets with continuum subtraction based upon continuum data or continuum models. This is made up of continuum subtraction and imaging components, which will have been coded in the general case.
Self-calibration and editing of all pointings in one processing step. This is a composite of calibration and imaging, which would have been previously parallelized.
3-D mosaicing allowing for sky curvature (non-coplanar baselines). This is made up of separation of data in fields and implementing a mosaic deconvolution for each field. These operations are composites of the general operations.
Simultaneous, multiple field imaging with ungridded data subtraction using MX-like algorithms. This is a straight forward task that has been previously implemented in a parallel way with PVM in SDE (Dragon). It should be straight forward to implement this code, since it has been parallelized at this granularity before.
For polarization calibration, all calibration sources are resolved and the polarized intensity distribution may not be like the total intensity distribution, therefore one must iteratively determine both source polarization structure and instrumental polarization. This is a special application of calibration, which would have been parallelized previously.
Imaging using multiple-frequency data sets and a user-defined model for spectral combination ``rules.'' This separation of data into multiple frequencies is analogous to spectral line imaging, where frequencies are independent, or whose dependencies (i.e., spectral indices) are known.
Imaging fields larger than the isoplanatic region. This includes the specific problem of 3-D imaging of data affected by sky curvature. The general non-isoplanatic problem is very difficult and computationally intensive.
VLBI imaging fields of view not radially smeared due to finite bandwidths are relatively small, so one needs ``fringe-rate'' imaging and multi-pointing processing for widely spaced sources in the field.

Calculate-Distribute-Collect-Iterate (CDCI) Problems

Many computationally intensive problems that may be posed in a manner that in which the data are separated somehow into independent pieces and sent to individual processors. The results are brought together and the instructions for further processing are determined and the data, then new instructions (e.g. revised model) are sent again to the processors. The success of using the embarrassingly-parallel techniques on these CDCI problems is limited by the ratio of time spent on individual pieces (parallel operations) versus the I/O and calculations needed for the next iteration (serial operations). These problems are inherently computationally expensive, however, the ultimate speed up on parallel machines may vary depending on data size and algorithm. Before extensive work is devoted to parallelizing these algorithms, the estimated degree of speed up for representative data sets should be investigated.

Determination and correction for pointing errors and errors in beam shape, using mosaic self-calibration techniques. These problems can be separated into short times where individual the effects of telescope pointing are determined on a baseline basis. After an initial iteration, the data are collected, imaged and a new model created for the next iteration. $\star$
Non-linear (MEM-based) mosaic algorithms (VTESS in AIPS, MOSAIC in SDE). These basically involve a large number of independent deconvolutions, then a combination to create a new model, which in turn is used for the next iteration of independent deconvolutions.

Fine-Grain Parallel Problems

Problems that can be addressed by fine-grain parallelism are divided into (1) general, (2) derived, and (3) specific cases. The general cases are ones that use a large number of low-level parallelizable functions, such as FFT's. Once these problems are addressed, a library of parallel Fortran subroutines will exist that can be called in future programs. Derived problems are ones that may use a number of parallelized functions created for the general problems. Specific problems are ones that can be parallelized at a low level, such as de-dispersing pulsar data, which are very computationally intensive. However, the solutions to these problems are not generally applicable to other cases.

General Fine-Grain Parallelism

General fine-grain problems will require the construction of optimized Fortran subroutines. Once these routines have been written, they can be used in other ``derived'' applications (see §3.2.2).

Computation of ``dirty'' images and point spread functions by 2-D FFT of selected, sorted, and gridded data with user control of data selection, gridding algorithm and its parameters, and image parameters (image size, cell sizes, polarization).

1.
Optimized FFT libraries exist in HPF distribution.

2.
Develop parallelized gridding and sorting libraries.

Because so many of the problems AIPS++ addresses have imaging at their core, optimization of this step will affect a large number of other user specifications.
Direct Fourier transform (DFT) of arbitrary (and usually small) size fields. Optimized DFT libraries may exist. DFT's are not carried out often, so a great deal of attention is not warranted early on.

Derivative Applications of Fine-Grain Parallelism

Once general tasks, such as FFT's, gridding etc., are parallelized, programs to address many problems (``derived applications'') can be parallelized by calling the appropriate parallelized Fortran subroutines.

Imaging after subtraction for sources. This is basically gridding and regridding. Assuming that these libraries have been parallelized previously for imaging optimization here will not require any additional coding.
Image deconvolution from dirty image and point-spread function. This includes the CLEAN algorithm (Högbom, Clark-Högbom, Cotton-Schwab, and Smooth-stabilized) and MEM (maximum entropy and maximum emptiness) deconvolution. The extensive use of FFT's (which could be parallelized) in deconvolution will offer some performance enhancement. For multiple-field or spectral-line image reconstruction, using an embarrassingly parallel construction would provide greater performance increase.

Special Applications of Fine-Grain Parallelism

De-dispersing of spectral, long time series data for pulsars with analysis and fitting in the intensity-frequency-time domain. It may be possible to use an existing parallelized program from a pulsar group and put it into AIPS++. The time for this would be short and an additional functionality is added to AIPS++. This is a very computationally-intensive processes. Optimization and installation on a fast parallel computer would encourage pulsar scientists to use AIPS++ for that single functionality. $\star$
Briggs Non-Negative Least Squares (NNLS) algorithm. This is one algorithm that solves the linear A*X = B problem with non-linear constraints. The Briggs NNLS algorithm is one solution, however, other solutions may have been developed for non-astronomical problems. Because this class of algorithms are very compute intensive, if other groups have implemented them on parallel machines we could use them with out time-consuming code development. $\star$

Combination Problems

Combination problems should be avoided by rewriting of algorithms. However, in cases where alternate formulation of algorithms is not possible, attention should be given to estimated improvements using both fine-grain and embarrassing parallelism.

Cross-calibration (enforced consistency) between data taken with different instruments (flux-scale, pointing). This includes a great deal of dependency, but computations are not likely to be spend large amounts of time in low-level parallelized routines. $\star$
Pointing self-calibration to determine corrections to single-dish and visibility data. This includes much dependency and a large fraction of time spent in comparisons etc. $\star$
3-D self-calibration. Spectral line channels are related to each other by the velocity structure of the observed source in the same way spatial dimensions are related. This additional information could be used in theory to self-calibrate a spectral line data set. This would allow for self-calibration of a cube where the signal-to-noise ratio in a single channel is insufficient for convergence.

Next: AIPS++ Requirements and a Tentative Time Table Up: NOTE 203 - PARALLELIZATION OF AIPS++ Previous: Types of Parallelization Contents