Description

Many parameters are determined from the specified region of an image. The region can be specified by a set of rectangular pixel coordinates, the channel ranges and the Stokes or a region file.

For directed output, run as:

myoutput=imstat()

General procedure:

#Specify inputs, then
myoutput=imstat()
#or specify inputs directly in calling sequence to task
myoutput=imstat(imagename='image.im', etc)
myoutput['KEYS'] #will contain the result associated with any of the keys given below

KEYS DESCRIPTION
blc absolute PIXEL coordinate of the bottom left corner of the bounding box surrounding the selected region
blcf Same as blc, but uses WORLD coordinates instead of pixels
trc the absolute PIXEL coordinate of the top right corner of the bounding box surrounding the selected region
trcf Same as trc, but uses WORLD coordinates instead of pixels
flux the flux or flux density. See below for details.
npts the number of unmasked points used
max the maximum pixel value
min minimum pixel value
maxpos absolute PIXEL coordinate of maximum pixel value
maxposf Same as maxpos, but uses WORLD coordinates instead of pixels
minpos absolute pixel coordinate of minimum pixel value
minposf Same as minpos, but uses WORLD coordinates instead of pixels
sum the sum of the pixel values: $\sum I_i$
sumsq the sum of the squares of the pixel values: $\sum I_i^2$
mean the mean of pixel values: $\bar{I} = \sum I_i / n$
sigma the standard deviation about the mean: $\sigma^2 = (\sum I_i - \bar{I})^2 / (n-1)$
rms the root mean square: $\sqrt {\sum I_i^2 / n}$
median the median pixel value
medabsdevmed the median of the absolute deviations from the median
quartile the inner-quartile range. Find the points which are 25% largest and 75% largest (the median is 50% largest).
q1 the first quartile
q3 the third quartile

 CURSOR AXES

The axes parameter allows one to set the cursor axes over which statistics are computed. For example, consider a 3-dimensional image for which axes=[0,2]. The statistics would be computed for each XZ (axes 0 and 2) plane in the image.  One could then examine those statistics as a function of the Y (axis 1) axis.

Each statistic is stored in an array in its own field in the returned dictionary. The dimensionality of these arrays is equal to the number of axes over which the statistics were not evaluated (called the display axes). For example, if the input image has four axes, and axes=[0], the output statistic arrays will have three dimensions. If axes=[0, 1], the output statistic arrays will have two dimensions.

The shape of the output arrays when axes has a positive number of elements is based on the region selection. If there is no region selection, the shape of the statistic arrays is just the shape of the image along the display (non-cursor) axes. For example, if the input image has dimensions of 300x400x4x80 (RA x Dec x Stokes x Freq) and axes=[0, 1], in the absence of a region selection, the shape of the output statistic arrays will be
4x80. If there is a region selection, the shape of the output statistic arrays will be determined by the number of planes along the display axes chosen in the region selection. For example, continuing with our example, if axes=[0,1], chans="5~15;30~70", and stokes="IV", the output statistic arrays will have shapes of 2x52. Only the selected planes will be displayed in the logger output if verbose=True.

In the case where the image has a pixel mask, and/or the mask parameter is specified, and because of this specification a plane is entirely masked, this element is included in the statistic arrays (usually with a value of 0). It is not included in the logger output if verbose=True. One can exclude such elements from computations on the output arrays by using the numpy.extract() method. For example, to compute the minimum rms value, not including any fully masked planes, one could use

stats = imstat(...)
rmsmin = numpy.min(numpy.extract(stats['npts']>0, stats['rms']))

Thus in the computation of rmsmin, only the rms elements are considered which have associated values of npts that are not zero.

ALGORITHMS

Several types of statistical algorithms are supported:

    • CLASSIC: This is the familiar algorithm, in which all unmasked pixels are used. One may choose one of two methods, which vary only by performance, for computing classic statistics via the clmethod parameter. The "tiled" method is the old method and is fastest in cases where there are a large number of individual sets of statistics to be computed and a small number of data points per set. This can occur when one sets the axes parameter, which causes several individual sets of statistics to be computed. The "framework" method uses the new statistics framework to compute statistics. This method is fastest in the regime where one has a small number of individual sets of statistics to calculate, and each set has a large number of points. For example, this method is fastest when computing statistics over an entire image in one go (no axes specified). A third option, "auto", chooses which method to use by predicting which be faster based on the number of pixels in the image and the choice of the axes parameter.
    • FIT-HALF: This algorithm calculates statistics on a dataset created from real and virtual pixel values. The real values are determined by the input parameters center and lside. The parameter center tells the algorithm where the center value of the combined real+virtual dataset should be. Options are the mean or the median of the input image's pixel values, or at zero. The lside parameter tells the algorithm on which side of center the real pixel values are located. True indicates that the real pixel values to be used are ≤ center. False indicates the real pixel values to be used are ≥ center. The virtual part of the dataset is then created by reflecting all the real values through the center value, to create a perfectly symmetric dataset composed of a real and a virtual component. Statistics are then calculated on this resultant dataset. These two parameters are ignored if algorithm is not "FIT-HALF". Because the maximum value is virtual if lside is True and the minimum value is virtual if lside is False, the value of the maximum position (if lside=True) or minimum position (if lside=False) is not reported in the returned record.
    • HINGES-FENCES: This algorithm calculates statistics by including data in a range between $Q1 - f*D$ and $Q3 + f*D$, inclusive, where Q1 is the first quartile of the distribution of unmasked data, subject to any specified pixel ranges, Q3 is the third quartile, $D = Q3 - Q1$ (the inner quartile range), and f is the user-specified fence factor. Negative values of f indicate that the full distribution is to be used (i.e., the classic algorithm is used). Sufficiently large values of f will also be equivalent to using the "CLASSIC" algorithm. For f = 0, only data in the inner quartile range is used for computing statistics. The value of fence is silently ignored if algorithm is not "HINGES-FENCES".
    • CHAUVENET: The idea behind this algorithm is to eliminate outliers based on a maximum z-score parameter value. A z-score is the number of standard deviations a point is from the mean of a distribution. This method thus is meant to be used for (nearly) normal distributions. In general, this is an iterative process, with successive iterations discarding additional outliers as the remaining points become closer to forming a normal distribution. Iterating stops when no additional points lie beyond the specified z-score value, or, if z-score is negative, when "CHAUVENET"'s criterion is met (see below). The parameter maxiter can be set to a non-negative value to prematurely abort this iterative process. When verbose=T, the "N-iter" column in the table that is logged represents the number of iterations that were executed.

"CHAUVENET"'s criterion allows the target z-score to decrease as the number of points in the distribution decreases on subsequent iterations. Essentially, the criterion is that the probability of having one point in a normal distribution at a maximum z-score of zmax must be at least 0.5. zmax is therefore a function of (only) the number of points in the distribution and is given by

npts = 0.5/erfc(zmax/$\sqrt{2}$)

where erfc() is the complementary error function. As iterating proceeds, the number of remaining points decreases as outliers are discarded, and so zmax likewise decreases. Convergence occurs when all remaining points fall within a z-score of zmax. Below is an illustrative table of zmax values and their corresponding npts values. For example, it is likely that there will be a 5-sigma "noise bump" in a perfectly noisy image with one million independent elements.

  • zmax npts
    1.0 1
    1.5 3
    2.0 10
    2.5 40
    3.0 185
    3.5 1,074
    4.0 7,893
    4.5 73,579
    5.0 872,138
    5.5 13,165,126
    6.0 253,398,672
    6.5 6,225,098,696
    7.0 195,341,107,722

     

NOTES ON FLUX DENSITIES AND FLUXES

Explanation of terminology:
The terms "intensity" or "brightness" refer to quantities with a unit such as Jy/beam or Kelvin (K).
The term "flux density" refers to quantities with a unit such as Janskys (Jy). This is dimensionally equivalent to W/m**2/Hz.
The term "flux" refers to a flux density integrated over the spectral or velocity axis, such as Jy*km/s or Jy*Hz. These are dimensionally equivalent to W/m**2.

Fluxes and flux densities are not computed if any of the following conditions is met:

  1. The image does not have a direction coordinate
  2. The image does not have a intensity-like brightness unit. Examples of such units are Jy/beam (in which case the image must also have a beam) and Kelvin (K)
  3. There are no direction axes in the cursor axes that are used
  4. If the (specified region of the) image has a non-degenerate spectral axis, and the image has a tabular spectral axis (axis with varying increments) [a]
  5. Any axis that is not a direction nor a spectral axis that is included in the cursor axes is not degenerate within in specified region

In cases where none of the above conditions is met, the flux density(ies) (intensities integrated over direction planes) will be computed if any of the following conditions is met:

  1. The image has no spectral coordinate
  2. The cursor axes do not include the spectral axis
  3. The spectral axis in the chosen region is degenerate

In the case where there is a non-degenerate spectral axis that is included in the cursor axes, the flux (flux density integrated over spectral planes) will be computed. In this case, the spectral portion of the flux unit will be the velocity unit of the spectral coordinate if it has one (e.g., if the brightness unit is Jy/beam and the velocity unit is km/s, the flux will have units of Jy km/s). If not, the spectral portion of the flux unit will be the frequency unit of the spectral axis (e.g., if the brightness unit is K and the frequency unit is Hz, the resulting flux unit will be K arcsec2 Hz).

In both cases of flux density or flux being computed, the resulting numerical value is assigned to the "flux" key in the output dictionary.

If the image has units of Jy/beam, the flux density is just the mean intensity multiplied by the number of beam areas included in the region. The beam area is defined as the volume of the elliptical Gaussian defined by the synthesized beam, divided by the maximum of that function, which is equivalent to

pi/(4*ln(2)) * major * minor

where ln() is the natural logarithm and major and minor are the major and minor FWHM axes of the beam, respectively.

Footnote Number a
Footnote Text [May be removed in the future.]