| Version 1.9 Build 1556
|
|
Next: AIPS++ Requirements and a Tentative Time Table
Up: NOTE 203 - PARALLELIZATION OF AIPS++
Previous: Types of Parallelization
Subsections
The specific user specifications have been categorized into (1)
embarrassingly parallel, (2) fine-grain, and (3) combination problems.
Within the EP and fine-grain cases, the problems are subdivided into
other categories. Within each category, the problems are presented in
the order that we propose to address them. Generally, the order is
from easiest to implement to most difficult. Starting with straight
forward problems gives us the opportunity to gain expertise that will
be required for more complex problems. Some descriptions end in a
star (); these are of particular interest because the
functionality does not exist in any general data reduction package.
Work on these problems should be given a higher priority in order that
the parallel effort of AIPS++ coincide with the general theme of
AIPS++, in which new functionality is given presidence over repetition
of existing functionality.
The embarrassingly parallel problems are divided into (1) general, (2)
derived, and (3) CDCI cases. General cases involve problems that
recur in astronomical data reduction, such as calibration. Derived
cases are specific cases that are made up of one or more components of
general problems. Also identified are a third category of
Calculate-Distribute-Collect-Iterate (CDCI) problems. CDCI problems
are a specific form of the derived case where the problem can be
represented as a collection of general EP problems with a significant
fraction of time spent in the collection and comparison of the
distributed results in order to direct additional iterations. CDCI
problems are among the most computationally intensive.
General embarrassingly parallel problems will require the distribution
of existing C++ code across multiple processors. Once these routines
have been constructed, they can be used in other applications (see §
3.1.2).
- Spectral line image construction and deconvolution.
Spectral line processing, including imaging and deconvolution can be
carried out easily in an embarrassingly parallel way. This is a
common case and is easily implemented. The details of this
implementation of an EP case will be important as a template for more
complicated problems. This should be the first implementation.
- 1.
- Spectral line cube formation. This is the simplest case,
where independent spectral-line channels are sent to separate
processors for imaging.
- 2.
- Spectral line deconvolution. Independent spectral-line
channels are sent to separate processors for deconvolution. If both
imaging and deconvolution are requested by the user, the two functions
should be pipelined together and sent to individual processors in one
step.
- Linear mosaic algorithm with linear deconvolution (MOSLIN
in SDE) together with linear combination of pre-deconvolved images,
weighting determined by primary beam. Separate fields are
independent and can be sent to different processors.
- Antenna-based determination of calibration and
self-calibration. This problem can be separated into independent
time slices and sent to individual processors.
- Antenna and baseline-based fringe fitting for a range of
spectral channels and fringe rates (normally only for VLBI data).
This is a very computationally intensive problem that can be separated
in time for parallel processing.
- Image construction from calibrated total power data
(frequency-switched, beam-switched, multi-beam, focal plane array)
sequences from single antennas and phased arrays, with and without
spectrometers. This can be divided into separate time ranges and
sent to separate processors.
- Calibration for non-isoplanicity using special extensions
of self-calibration. This is the general case, which includes the
wide field imaging, with clusters of fields. Fields could be
constructed by shifting the phases of the visibility data then sent to
individual processors.
- Parameter-driven automated flagging for large data sets.
This could be done by slicing in time. However, flagging operations
are usually not computationally intensive, thus the benefit of this is
not expected to be great. Low priority.
Once embarrassingly parallel general tasks, such as calibration, are
parallelized, programs to address many problems (``derived
applications'') can be parallelized by calling the appropriate
subtasks. The order of these tasks is not as important as the general
case, because these will be addressed after the general cases that
they call or emulate have been written.
- Imaging of spectral line data sets with continuum
subtraction based upon continuum data or continuum models. This is
made up of continuum subtraction and imaging components, which will
have been coded in the general case.
- Self-calibration and editing of all pointings in one
processing step. This is a composite of calibration and imaging,
which would have been previously parallelized.
- 3-D mosaicing allowing for sky curvature (non-coplanar
baselines). This is made up of separation of data in fields and
implementing a mosaic deconvolution for each field. These operations
are composites of the general operations.
- Simultaneous, multiple field imaging with ungridded data
subtraction using MX-like algorithms. This is a straight forward
task that has been previously implemented in a parallel way with PVM
in SDE (Dragon). It should be straight forward to implement this
code, since it has been parallelized at this granularity before.
- For polarization calibration, all calibration sources are
resolved and the polarized intensity distribution may not be like the
total intensity distribution, therefore one must iteratively determine
both source polarization structure and instrumental polarization.
This is a special application of calibration, which would have been
parallelized previously.
- Imaging using multiple-frequency data sets and a
user-defined model for spectral combination ``rules.'' This
separation of data into multiple frequencies is analogous to spectral
line imaging, where frequencies are independent, or whose dependencies
(i.e., spectral indices) are known.
- Imaging fields larger than the isoplanatic region. This
includes the specific problem of 3-D imaging of data affected by sky
curvature. The general non-isoplanatic problem is very difficult and
computationally intensive.
- VLBI imaging fields of view not radially smeared due to
finite bandwidths are relatively small, so one needs ``fringe-rate''
imaging and multi-pointing processing for widely spaced sources in the
field.
Many computationally intensive problems that may be posed in a manner
that in which the data are separated somehow into independent pieces
and sent to individual processors. The results are brought together
and the instructions for further processing are determined and the
data, then new instructions (e.g. revised model) are sent again to the
processors. The success of using the embarrassingly-parallel
techniques on these CDCI problems is limited by the ratio of time
spent on individual pieces (parallel operations) versus the I/O and
calculations needed for the next iteration (serial operations). These
problems are inherently computationally expensive, however, the
ultimate speed up on parallel machines may vary depending on data size
and algorithm. Before extensive work is devoted to parallelizing
these algorithms, the estimated degree of speed up for representative
data sets should be investigated.
- Determination and correction for pointing errors and errors
in beam shape, using mosaic self-calibration techniques. These
problems can be separated into short times where individual the
effects of telescope pointing are determined on a baseline basis.
After an initial iteration, the data are collected, imaged and a new
model created for the next iteration.
- Non-linear (MEM-based) mosaic algorithms (VTESS in AIPS,
MOSAIC in SDE). These basically involve a large number of
independent deconvolutions, then a combination to create a new model,
which in turn is used for the next iteration of independent
deconvolutions.
Problems that can be addressed by fine-grain parallelism are divided
into (1) general, (2) derived, and (3) specific cases. The general
cases are ones that use a large number of low-level parallelizable
functions, such as FFT's. Once these problems are addressed, a
library of parallel Fortran subroutines will exist that can be called
in future programs. Derived problems are ones that may use a number
of parallelized functions created for the general problems. Specific
problems are ones that can be parallelized at a low level, such as
de-dispersing pulsar data, which are very computationally intensive.
However, the solutions to these problems are not generally applicable
to other cases.
General fine-grain problems will require the construction of optimized
Fortran subroutines. Once these routines have been written, they can
be used in other ``derived'' applications (see §3.2.2).
Once general tasks, such as FFT's, gridding etc., are parallelized,
programs to address many problems (``derived applications'') can be
parallelized by calling the appropriate parallelized Fortran
subroutines.
- Imaging after subtraction for sources. This is basically
gridding and regridding. Assuming that these libraries have been
parallelized previously for imaging optimization here will not require
any additional coding.
- Image deconvolution from dirty image and point-spread
function. This includes the CLEAN algorithm (Högbom,
Clark-Högbom, Cotton-Schwab, and Smooth-stabilized) and MEM (maximum
entropy and maximum emptiness) deconvolution. The extensive use of
FFT's (which could be parallelized) in deconvolution will offer some
performance enhancement. For multiple-field or spectral-line image
reconstruction, using an embarrassingly parallel construction would
provide greater performance increase.
- De-dispersing of spectral, long time series data for
pulsars with analysis and fitting in the intensity-frequency-time
domain. It may be possible to use an existing parallelized program
from a pulsar group and put it into AIPS++. The time for this would
be short and an additional functionality is added to AIPS++. This is
a very computationally-intensive processes. Optimization and
installation on a fast parallel computer would encourage pulsar
scientists to use AIPS++ for that single functionality.
- Briggs Non-Negative Least Squares (NNLS) algorithm. This
is one algorithm that solves the linear A*X = B problem with
non-linear constraints. The Briggs NNLS algorithm is one solution,
however, other solutions may have been developed for non-astronomical
problems. Because this class of algorithms are very compute
intensive, if other groups have implemented them on parallel machines
we could use them with out time-consuming code development.
Combination problems should be avoided by rewriting of algorithms.
However, in cases where alternate formulation of algorithms is not
possible, attention should be given to estimated improvements using
both fine-grain and embarrassing parallelism.
- Cross-calibration (enforced consistency) between data taken
with different instruments (flux-scale, pointing). This includes a
great deal of dependency, but computations are not likely to be spend
large amounts of time in low-level parallelized routines.
- Pointing self-calibration to determine corrections to
single-dish and visibility data. This includes much dependency and a
large fraction of time spent in comparisons etc.
- 3-D self-calibration. Spectral line channels are related
to each other by the velocity structure of the observed source in the
same way spatial dimensions are related. This additional information
could be used in theory to self-calibrate a spectral line data set.
This would allow for self-calibration of a cube where the
signal-to-noise ratio in a single channel is insufficient for
convergence.
Next: AIPS++ Requirements and a Tentative Time Table
Up: NOTE 203 - PARALLELIZATION OF AIPS++
Previous: Types of Parallelization
  Contents
Please send questions or comments about AIPS++ to aips2-request@nrao.edu.
Copyright © 1995-2000 Associated Universities Inc.,
Washington, D.C.
Return to AIPS++ Home Page
2006-10-15