# Calculation of Weights for Data with Varying Integration Time

**George Moellenbrock****Original 18 Dec 2018, latest edited version 04 Nov 2019**

When the nominal weights are expected to be uniform (because

integration time, channel bandwidth, effective Tsys, collecting area,

etc. are all uniform, or the visibilities are normalized), extracting

weight information from the apparent statistics of otherwise stable

visibility measurements is a simple matter of calculating the apparent

simple variance in the visibility real and imaginary parts over a

sufficient local sample of values. The real and imaginary part

variances should be approximately equal, and the inverse of their mean

is the correct weight to assign to each of the visibility values

within the sample. Here, "stable visibility" means no *systematic*

variation of amplitude or phase within the local

sample. Noise-dominated visibilities are ideal; otherwise,

well-calibrated data with no true visibility variation are

desirable. These conditions are also needed for the more general case

described below.

When the integration time (aka "exposure") varies within the local

sample (such as can be generated by averaging of the data

post-correlation, where the number of samples per averaging bin may

vary, especially at the end of scans), we expect the correct variance

for each visibility to be inversely proportional to the net

integration time, and this complicates the calculation. It is

necessary to determine a weighted variance per unit inverse

integration time, wherein the sample weights for the variance

calculation are the per-visibility integration times, e$_i$. If the only

reason the underlying variance differs among samples is the variable

integration time, then a uniform normalized variance estimate of the

whole sample may be obtained by scaling the residual data per sample

by the square root of their (known) integration times. Here, residual

data means any underlying visibility signal---presumably the average

of the visibility samples, using nominal (proportional to integration

time, etc.) weights---has been subtracted. The simple variance of this

rescaled sample is, in effect, the variance per unit inverse

integration time.

For visibilities V$_i$, with integration times e$_i$:

<var$_{norm}$> = Sum (e$_i$ (V$_i$ - <V>)$^2$ / N [1]

where <V> = Sum (w$_i$ V$_i$) / Sum (w$_i$) [1a]

and w$_i$ are the nominal data weights presumably proportional to

integration time and other relevant factors. In practice, we could

probably just use w$_i$ = e$_i$ in equation [1a] since all of the other

relevant factors witin w$_i$ are assumed constant within the

sample. Note that the units of <var$_{norm}$> are in squared visibility

amplitude (Jy$^2$, presumably) times seconds. Note also that <var$_{norm}$>

is essentially the simple variance of the ensemble [sqrt(e$_i$).dV$_i$]

(where dV$_i$ is (V$_i$-<V>)), i.e., of the residual visibilities scaled

so that their noise is independent of integration time.

The normalized weight-per-unit-integration time is thus the inverse of

<var$_{norm}$>:

W$_{norm}$ = 1/<var$_{norm}$> [2]

and per-datum revised weights may be calculated as:

W$_i$ = W$_{norm}$ * e$_i$ [3]

Another way of arriving at this result is to calculate a weighted

variance:

<var> = Sum (e$_i$ (V$_i$ - <V>)$^2$) / Sum(e$_i$) [4]

which corresponds to the (simple) mean exposure time, which is:

<e> = Sum(e$_i$) / N [5]

The product of these yields <var$_{norm}$>, as above in [1]:

<var$_{norm}$> = <var><e> [6]

and W$_{norm}$ may be formed and applied as in [2] and [3] above.

This calculation should be done for both real and imaginary parts of

the visibility sample and averaged, or for both parts jointly, and [3]

used to form the revised weights.

NB: In calculating sample variance, it is generally customary to

acknowledge the loss of one degree of freedom due to use of the mean

visibility, <V> in the calculation. Essentially, <V> will have a

finite error that tends to bias the resulting variance downward. For

simple variance calculations, a factor N/(N-1) is applied to the

variance calculation to unbias it, and this factor can be significant

for modest N. Since a non-trivially weighted mean is used in the above

(otherwise simple, non-weighted) variance calculation (eqn [1]), it

may be appropriate to consider a more carefully weighted calculation

for the N/(N-1) factor. The required factor is:

D = 1 - ( Sum(w$_i$^2) / Sum(w$_i$)^2 ) [9]

where w$_i$ are the a priori nominal weights used in [1a] above. This

factor can be shown to equal (N-1)/N and so should be *divided* into

the <var$_{norm}$> result.

However, since the nominal error in the variance (and thus the

weights) will be <10% (an accuracy we are unlikely to achieve in

general anyway) for N>10, and will be uniform over many sample groups

in the overall statwt execution, we assume that it is adequate to use the

simpler N/(N-1) factor, or omit it entirely.