Version 1.9 Build 1556

News	FAQ
Search	Home

NOTE 224 - AIPS++ Least Squares background

Wim Brouw

22 January 1999

A postscript version of this note is available (124kB).

Introduction

Trying to find a more stable non-linear least squares method (LSQ) than the one currently used in Newstar, I found another few areas that could be added or improved in the existing routines. In the end I produced a set of routines to handle linear and non-linear cases, both with or without SVD, and constraint equations.

In addition the present document describes the usage and background of the different routines.

Least squares solutions in Aips++ have the following general background:

1.: all are based on creating normal equations from an, initially unknown, number of condition equations
2.: the actual LSQ-object is a symmetric (or Hermitian) matrix
3.: matrix inversions are done using in-place Cholesky methods where possible: it is for standard use the fastest, least memory consuming and a stable method. In cases where the inverse matrix is needed, the method uses resources of the same order as full LU-decomposition (only if explicit constraint equations are present)
4.: input/output to the LSQ-object is done in any precision, internally all calculations are done in double precision
5.: complex numbers are assumed to be contiguous pairs of real numbers with the real part being the first

The following terminology is used throughout:

n: number of unknowns to be solved
m: number of simultaneous knowns
N: number of condition equations
$\chi^{2}_{}$: merit function
z^*: complex conjugate of z
z: real part of z
z: imaginary part of z
x: column vector of unknowns x₀,..., x_{n - 1}
a_i: vector of factors a_{0, i},..., a_{n - 1, i} in condition equation i
( i = 0,..., N - 1 )
C: the N x n array of condition equations
A: n x n array: normal equations matrix
L: n column vector: right-hand side normal equations
y_i: model value for condition equation i ( i = 0,..., N - 1 )
l_i: measured value for condition equation i ( i = 0,..., N - 1 )
$\sigma_{i}^{}$: standard deviation for condition equation i ( i = 0,..., N - 1 )
w_i: weight for condition equation i ( w_i = 1/ $\sigma_{i}^{2}$ )
[ab]: shorthand for $\sum_{i=0}^{N-1}$ w_ia_ib_i

The condition equations can, with the above definitions, be written in one of the following ways:

a_i^.x = l_i i = 0,..., N - 1

(1)

or as:

$\displaystyle \sum_{k=0}^{n-1}$ a_ikx_k = l_i i = 0,..., N - 1

(2)

or as:

C^.x = l

(3)

The normal matrix is defined as:

A = C^{T .}Q^{-1 .}C

(4)

resulting in the normal equations:

A^.x = L

(5)

where L = C^{T .}Q^{-1 .}l and Q is the covariance matrix of the observations. Here only covariance matrices with the same value in each column i (the `weight' w_i) are considered.

Linear equations

Real

The merit function we want to minimise is:

(6)

For the minimum of $\chi^{2}_{}$ holds:

$\displaystyle {\frac{\partial \chi^{2}}{\partial x_{k}}}$ = 0 k = 0,..., n - 1

(7)

which leads to the following set of normal equations:

$\displaystyle \sum_{i=0}^{N-1}$ $\displaystyle {\frac{\left[ l_{i} - y_{i}\right]}{\sigma_{i}^{2}}}$ $\displaystyle {\frac{\partial y_{i}}{\partial x_{k}}}$ = 0 k = 0,..., n - 1

(8)

or in matrix form:

$\displaystyle \left(\vphantom{ \begin{array}{cccc} \left[ a_{0}a_{0}\right] & ... ...n-1}a_{1}\right] & \cdots & \left[ a_{n-1}a_{n-1}\right] \end{array} }\right.$ $\displaystyle \begin{array}{cccc} \left[ a_{0}a_{0}\right] & \left[ a_{0}a_{1}... ...left[ a_{n-1}a_{1}\right] & \cdots & \left[ a_{n-1}a_{n-1}\right] \end{array}$ $\displaystyle \left.\vphantom{ \begin{array}{cccc} \left[ a_{0}a_{0}\right] & ... ...n-1}a_{1}\right] & \cdots & \left[ a_{n-1}a_{n-1}\right] \end{array} }\right)$ ^.x = $\displaystyle \left(\vphantom{ \begin{array}{c} \left[ a_{0}l\right] \\ \left[ a_{1}l\right] \\ \vdots \\ \left[ a_{n-1}l\right] \end{array} }\right.$ $\displaystyle \begin{array}{c} \left[ a_{0}l\right] \\ \left[ a_{1}l\right] \\ \vdots \\ \left[ a_{n-1}l\right] \end{array}$ $\displaystyle \left.\vphantom{ \begin{array}{c} \left[ a_{0}l\right] \\ \left[ a_{1}l\right] \\ \vdots \\ \left[ a_{n-1}l\right] \end{array} }\right)$

(9)

or:

A^.x = L

(10)

This symmetric set of equations can be solved, if the matrix is positive definite, i.e. if there are no dependencies between the columns. The solution is given by:

x = A^{-1 .}L

(11)

Errors

After solution for the unknowns x, $\chi^{2}_{}$ as defined in (6) can be directly calculated if we rewrite it as:

$\displaystyle \chi^{2}_{}$ = [ll] - 2 $\displaystyle \sum_{k=0}^{n-1}$ [a_kl]x_k + $\displaystyle \sum_{i=0}^{N-1}$ w_iy_i²

(12)

Noting that y_i = $\sum_{k=0}^{n-1}$ a_kx_k and using equation (9) to note that [a_il] = $\sum_{k=0}^{n-1}$ [a_ia_k]x_k, we can rewrite (12) as:

$\displaystyle \chi^{2}_{}$ = [ll] - $\displaystyle \sum_{k=0}^{n-1}$ x_k[a_kl]

(13)

The $\chi^{2}_{}$ could be used to assess the goodness of fit if the actual $\sigma_{i}^{}$ 's were known, and the errors are normal distributed. In general the actual values of $\sigma_{i}^{}$ are not known, and often the distribution is not normal (.e.g. if we solve for logarithmic values). An estimate of the standard deviation can be made by:

$\displaystyle \sigma^{2}_{}$ = $\displaystyle {\frac{\left[ \left( l_{i}-y_{i}\right)^{2}\right]}{N-n}}$

(14)

which can be estimated by:

$\displaystyle \sigma_{o}^{2}$ = $\displaystyle {\frac{\chi^{2}}{N-n}}$

(15)

to give `an error per observation'. The `error per unit weight', or the standard deviation, can be expressed as:

$\displaystyle \sigma_{w}^{2}$ = $\displaystyle {\frac{\chi^{2}}{[1]}}$ $\displaystyle {\frac{N}{N-n}}$

(16)

The uncertainty in the solution x_i can be expressed as:

$\displaystyle \sigma^{2}_{}$ $\displaystyle \left(\vphantom{x_{i}}\right.$ x_i $\displaystyle \left.\vphantom{x_{i}}\right)$ = $\displaystyle \sum_{k=1}^{N-1}$ $\displaystyle \sigma_{k}^{2}$ $\displaystyle \left(\vphantom{\frac{\partial{x_{i}}}{\partial{y_{k}}}}\right.$ $\displaystyle {\frac{\partial{x_{i}}}{\partial{y_{k}}}}$ $\displaystyle \left.\vphantom{\frac{\partial{x_{i}}}{\partial{y_{k}}}}\right)^{2}$

(17)

Since

x_i = $\displaystyle \sum_{k=0}^{n-1}$ $\displaystyle \left(\vphantom{\mathbf{A}^{-1}}\right.$ A^-1 $\displaystyle \left.\vphantom{\mathbf{A}^{-1}}\right)_{ik}$ $\displaystyle \left[\vphantom{a_{k}l}\right.$ a_kl $\displaystyle \left.\vphantom{a_{k}l}\right]$

(18)

we have:

$\displaystyle {\frac{\partial{x_{i}}}{\partial{y_{k}}}}$ = $\displaystyle \sum_{j=0}^{n-1}$ $\displaystyle \left(\vphantom{\mathbf{A}^{-1}}\right.$ A^-1 $\displaystyle \left.\vphantom{\mathbf{A}^{-1}}\right)_{ij}$ $\displaystyle {\frac{a_{kj}}{\sigma_{k}^{2}}}$

(19)

leading to:

$\displaystyle \sigma^{2}_{}$ $\displaystyle \left(\vphantom{x_{i}}\right.$ x_i $\displaystyle \left.\vphantom{x_{i}}\right)$ = $\displaystyle \sum_{k=0}^{n-1}$ $\displaystyle \sum_{l=0}^{n-1}$ $\displaystyle \left(\vphantom{\mathbf{A}^{-1}}\right.$ A^-1 $\displaystyle \left.\vphantom{\mathbf{A}^{-1}}\right)_{ik}$ $\displaystyle \left(\vphantom{\mathbf{A}^{-1}}\right.$ A^-1 $\displaystyle \left.\vphantom{\mathbf{A}^{-1}}\right)_{il}$ $\displaystyle \left[\vphantom{\sum_{j=0}^{N-1} \frac{a_{kj}a_{lj}}{\sigma_{j}^{2}}}\right.$ $\displaystyle \sum_{j=0}^{N-1}$ $\displaystyle {\frac{a_{kj}a_{lj}}{\sigma_{j}^{2}}}$ $\displaystyle \left.\vphantom{\sum_{j=0}^{N-1} \frac{a_{kj}a_{lj}}{\sigma_{j}^{2}}}\right]$

(20)

Doing the sums, this equation reduces to:

$\displaystyle \sigma^{2}_{}$ $\displaystyle \left(\vphantom{x_{i}}\right.$ x_i $\displaystyle \left.\vphantom{x_{i}}\right)$ = $\displaystyle \left(\vphantom{\mathbf{A}^{-1}}\right.$ A^-1 $\displaystyle \left.\vphantom{\mathbf{A}^{-1}}\right)_{ii}$

(21)

If the $\sigma_{i}^{}$ was not known originally, the estimate of the standard uncertainties in the unknowns x are:

$\displaystyle \sigma$ $\displaystyle \left(\vphantom{x_{i}}\right.$ x_i $\displaystyle \left.\vphantom{x_{i}}\right)$ = $\displaystyle \sigma_{o}^{}$ $\displaystyle \sqrt{{\left(\mathbf{A}^{-1}\right)}_{ii}}$

(22)

If the uncertainties in the unknowns are wanted, the inverse matrix A^-1 is calculated by backsubstitution of the identity matrix, and the uncertainties by (22).

Complex

The merit function we want to minimise in this case is:

$\displaystyle \chi^{2}_{}$ = $\displaystyle \sum_{i=0}^{N-1}$ $\displaystyle \left[\vphantom{ \frac{l_{i}-y_{i}}{\sigma_{i}} }\right.$ $\displaystyle {\frac{l_{i}-y_{i}}{\sigma_{i}}}$ $\displaystyle \left.\vphantom{ \frac{l_{i}-y_{i}}{\sigma_{i}} }\right]$ $\displaystyle \left[\vphantom{ \frac{l_{i}-y_{i}}{\sigma_{i}} }\right.$ $\displaystyle {\frac{l_{i}-y_{i}}{\sigma_{i}}}$ $\displaystyle \left.\vphantom{ \frac{l_{i}-y_{i}}{\sigma_{i}} }\right]^{*}$

(23)

Differentiating $\chi^{2}_{}$ leads to the following set of normal equations:

$\displaystyle \left(\vphantom{ \begin{array}{cccc} \left[ a_{0}^{*}a_{0}\right... ...a_{1}\right] & \cdots & \left[ a_{n-1}^{*}a_{n-1}\right] \end{array} }\right.$ $\displaystyle \begin{array}{cccc} \left[ a_{0}^{*}a_{0}\right] & \left[ a_{0}^... ...{n-1}^{*}a_{1}\right] & \cdots & \left[ a_{n-1}^{*}a_{n-1}\right] \end{array}$ $\displaystyle \left.\vphantom{ \begin{array}{cccc} \left[ a_{0}^{*}a_{0}\right... ...a_{1}\right] & \cdots & \left[ a_{n-1}^{*}a_{n-1}\right] \end{array} }\right)$ ^.x = $\displaystyle \left(\vphantom{ \begin{array}{c} \left[ a_{0}^{*}l\right] \\ ... ...{*}l\right] \\ \vdots \\ \left[ a_{n-1}^{*}l\right] \end{array} }\right.$ $\displaystyle \begin{array}{c} \left[ a_{0}^{*}l\right] \\ \left[ a_{1}^{*}l\right] \\ \vdots \\ \left[ a_{n-1}^{*}l\right] \end{array}$ $\displaystyle \left.\vphantom{ \begin{array}{c} \left[ a_{0}^{*}l\right] \\ ... ...{*}l\right] \\ \vdots \\ \left[ a_{n-1}^{*}l\right] \end{array} }\right)$

(24)

The normal matrix is Hermitian. It can be solved by Choleski methods. However, internally the matrix is converted to a real form. Although this has an, in general, small memory penalty, it has no influence on CPU time, and makes it possible to use the same routines for complex and real solutions. The conversion to real is done by splitting each element A_ij of the normal matrix into A_ij + $\imath$ A_ij and replacing it by:

A_ij = $\displaystyle \left(\vphantom{ \begin{array}{rr} {A_{ij}}_{\Re} & -{A_{ij}}_{\Im} \\ {A_{ij}}_{\Im} & {A_{ij}}_{\Re} \end{array} }\right.$ $\displaystyle \begin{array}{rr} {A_{ij}}_{\Re} & -{A_{ij}}_{\Im} \\ {A_{ij}}_{\Im} & {A_{ij}}_{\Re} \end{array}$ $\displaystyle \left.\vphantom{ \begin{array}{rr} {A_{ij}}_{\Re} & -{A_{ij}}_{\Im} \\ {A_{ij}}_{\Im} & {A_{ij}}_{\Re} \end{array} }\right)$

(25)

and simular replacements for the vector elements x_i and L_i as:

x_i = $\displaystyle \left(\vphantom{ \begin{array}{r} {x_{i}}_{\Re} \\ {x_{i}}_{\Im} \end{array} }\right.$ $\displaystyle \begin{array}{r} {x_{i}}_{\Re} \\ {x_{i}}_{\Im} \end{array}$ $\displaystyle \left.\vphantom{ \begin{array}{r} {x_{i}}_{\Re} \\ {x_{i}}_{\Im} \end{array} }\right)$

(26)

and:

L_i = $\displaystyle \left(\vphantom{ \begin{array}{r} {L_{i}}_{\Re} \\ {L_{i}}_{\Im} \end{array} }\right.$ $\displaystyle \begin{array}{r} {L_{i}}_{\Re} \\ {L_{i}}_{\Im} \end{array}$ $\displaystyle \left.\vphantom{ \begin{array}{r} {L_{i}}_{\Re} \\ {L_{i}}_{\Im} \end{array} }\right)$

(27)

Another reason for solving real rather than complex equations is given in the next section.

Separable complex

In cases where both the unknowns x and their complex conjugates x^* appear in the condition equations, differentiating the merit function (23) will not produce a symmetric or Hermitian normal matrix, since there exists no linear relation between x_i and x^*_i. We could, of course, solve for 2n complex equations with added constraints that the sum of even and odd unknowns must be real, and their difference imaginary, but this will lead to 4n complex equations.

If, however, we consider each complex unknown as two real unknowns (i.e. x_i and x_i) then differentiating (23) produces the following symmetric set of 2n real equations:

$\displaystyle \left(\vphantom{ \begin{array}{ccccc} {\left[ {a_{0}}^{*}a_{0}\r... ...& \cdots & {\left[ {a_{2n-1}}^{*}a_{2n-1}\right]}_{\Re} \end{array} }\right.$ $\displaystyle \begin{array}{ccccc} {\left[ {a_{0}}^{*}a_{0}\right]}_{\Re} & {... ...]}_{\Im} & \cdots & {\left[ {a_{2n-1}}^{*}a_{2n-1}\right]}_{\Re} \end{array}$ $\displaystyle \left.\vphantom{ \begin{array}{ccccc} {\left[ {a_{0}}^{*}a_{0}\r... ...& \cdots & {\left[ {a_{2n-1}}^{*}a_{2n-1}\right]}_{\Re} \end{array} }\right)$ ^.

^. $\displaystyle \left(\vphantom{ \begin{array}{c} {x_{0}}_{\Re} \\ {x_{0}}_{\... ... {x_{1}}_{\Re} \\ \vdots \\ {x_{n-1}}_{\Im} \\ \end{array} }\right.$ $\displaystyle \begin{array}{c} {x_{0}}_{\Re} \\ {x_{0}}_{\Im} \\ {x_{1}}_{\Re} \\ \vdots \\ {x_{n-1}}_{\Im} \\ \end{array}$ $\displaystyle \left.\vphantom{ \begin{array}{c} {x_{0}}_{\Re} \\ {x_{0}}_{\... ... {x_{1}}_{\Re} \\ \vdots \\ {x_{n-1}}_{\Im} \\ \end{array} }\right)$ = $\displaystyle \left(\vphantom{ \begin{array}{c} {\left[ {a_{0}}^{*}l\right]}_{... ... \\ \vdots \\ {\left[ {a_{2n-1}}^{*}l\right]}_{\Im} \end{array} }\right.$ $\displaystyle \begin{array}{c} {\left[ {a_{0}}^{*}l\right]}_{\Re} \\ {\left... ...t]}_{\Re} \\ \vdots \\ {\left[ {a_{2n-1}}^{*}l\right]}_{\Im} \end{array}$ $\displaystyle \left.\vphantom{ \begin{array}{c} {\left[ {a_{0}}^{*}l\right]}_{... ... \\ \vdots \\ {\left[ {a_{2n-1}}^{*}l\right]}_{\Im} \end{array} }\right)$

(28)

A number of special cases can be distinguished.

In the `normal' complex case (previous section), a_2i $\equiv$ a_{2i + 1}, and (28) reduces to (25).

If all a_2i and a_{2i + 1} are real, but specified, e.g., as a complex number, (28) deteriorates into:

$\displaystyle \left(\vphantom{ \begin{array}{ccccc} \left[ {a_{0}}_{\Re}{a_{0}... ...& \cdots & \left[ {a_{n-1}}_{\Im}{a_{n-1}}_{\Im} \right] \end{array} }\right.$ $\displaystyle \begin{array}{ccccc} \left[ {a_{0}}_{\Re}{a_{0}}_{\Re}\right] & ... ...ght] & 0 & \cdots & \left[ {a_{n-1}}_{\Im}{a_{n-1}}_{\Im} \right] \end{array}$ $\displaystyle \left.\vphantom{ \begin{array}{ccccc} \left[ {a_{0}}_{\Re}{a_{0}... ...& \cdots & \left[ {a_{n-1}}_{\Im}{a_{n-1}}_{\Im} \right] \end{array} }\right)$ ^.

^. $\displaystyle \left(\vphantom{ \begin{array}{c} {x_{0}}_{\Re} \\ {x_{0}}_{\... ... {x_{1}}_{\Re} \\ \vdots \\ {x_{n-1}}_{\Im} \\ \end{array} }\right.$ $\displaystyle \begin{array}{c} {x_{0}}_{\Re} \\ {x_{0}}_{\Im} \\ {x_{1}}_{\Re} \\ \vdots \\ {x_{n-1}}_{\Im} \\ \end{array}$ $\displaystyle \left.\vphantom{ \begin{array}{c} {x_{0}}_{\Re} \\ {x_{0}}_{\... ... {x_{1}}_{\Re} \\ \vdots \\ {x_{n-1}}_{\Im} \\ \end{array} }\right)$ = $\displaystyle \left(\vphantom{ \begin{array}{c} \left[ {a_{0}}_{\Re}l_{\Re}\ri... ...] \\ \vdots \\ \left[ {a_{n-1}}_{\Im}l_{\Im}\right] \end{array} }\right.$ $\displaystyle \begin{array}{c} \left[ {a_{0}}_{\Re}l_{\Re}\right] \\ \left[... ...Re}\right] \\ \vdots \\ \left[ {a_{n-1}}_{\Im}l_{\Im}\right] \end{array}$ $\displaystyle \left.\vphantom{ \begin{array}{c} \left[ {a_{0}}_{\Re}l_{\Re}\ri... ...] \\ \vdots \\ \left[ {a_{n-1}}_{\Im}l_{\Im}\right] \end{array} }\right)$

(29)

Note that in this case there is a true separation of the real and imaginary parts of the different unknowns, and two separate real solutions with each n unknowns will produce exactly the same result.

The full and special cases are all catered for in the aips++ routines.

Errors

Using arguments comparable to those in (2.1.1), we can get (13) for the complex case as:

$\displaystyle \chi^{2}_{}$ = [ll^*] - $\displaystyle \sum_{k=0}^{n-1}$ $\displaystyle \left(\vphantom{ {x_{k}}_{\Re}{\left[ a_{2k}^{*}l \right]}_{\Re} + {x_{k}}_{\Im}{\left[ a_{2k+1}^{*}l \right]}_{\Im} }\right.$ x_k $\displaystyle \left[\vphantom{ a_{2k}^{*}l }\right.$ a_2k^*l $\displaystyle \left.\vphantom{ a_{2k}^{*}l }\right]_{\Re}$ + x_k $\displaystyle \left[\vphantom{ a_{2k+1}^{*}l }\right.$ a_{2k + 1}^*l $\displaystyle \left.\vphantom{ a_{2k+1}^{*}l }\right]_{\Im}$ $\displaystyle \left.\vphantom{ {x_{k}}_{\Re}{\left[ a_{2k}^{*}l \right]}_{\Re} + {x_{k}}_{\Im}{\left[ a_{2k+1}^{*}l \right]}_{\Im} }\right)$

(30)

with the a's and x's as defined in (29).

Dependant linear equations

If there are not enough independent condition equations, the normal matrix A cannot be inverted, and a call to WNMLTN will fail with a .false. return value.

The equations could still be solved if some additional `constraint' equations would be introduced. In the more complex cases the precise, let alone the best, form for these additional equations is difficult to determine (e.g. the redundancy situation in Westerbork).

A method known as `Singular value decomposition' (SVD) can be used to obtain the minimal set of orthogonal equations that have to be added to solve the LSQ problem. Several implementations exist in the literature.

In general we can distinguish three types of constrained equations:

the minimum number of sufficient constraint equations are known to be able to solve a system of equations
constraints are used to add additional information (e.g. the sum of angles of a triangle)
no actual constraint equations are known

All three cases are handled in the LSQ package, the first two in the same way.

The general constraint situation arises from the use of Lagrange multiplicators. Assume that in addition to the condition equations, with measured values, we have a set of p rigorous equations:

$\displaystyle \phi_{i}^{}$ (x) = 0 i = 0,..., p - 1

(31)

We must therefore make $\chi^{2}_{}$ minimal, subject to the set of (31), or:

$\displaystyle \sum_{i=0}^{n-1}$ $\displaystyle {\frac{\partial \chi^{2}}{\partial x_{i}}}$ dx_i = 0

(32)

subject to the conditions:

$\displaystyle \sum_{i=0}^{n-1}$ $\displaystyle {\frac{\partial \phi_{k}}{\partial x_{i}}}$ dx_i = 0 k = 0,..., p - 1

(33)

which leads to a set of n + p equations:

$\displaystyle {\frac{\partial \chi^{2}}{\partial x_{i}}}$ + $\displaystyle \sum_{k=0}^{p-1}$ $\displaystyle \lambda_{k}^{}$ $\displaystyle {\frac{\partial \phi_{k}}{\partial x_{i}}}$ = 0 i = 0,..., n - 1

(34)

together with the (31).

Note that I have chosen for having constraint equations linear in the unknowns, with a zero value. In cases where this is not adequate (e.g. x + y + z = 360) a simple linear transformation will suffice to make it e.g. x' + y' + z' = 0.

Defining the second term in (34) as B_ik, we can write our expanded set of normal equations as:

$\displaystyle \left(\vphantom{\begin{array}{cc} \mathbf{A} & \mathbf{B} \\ \mathbf{B}^{T} & 0 \end{array} }\right.$ $\displaystyle \begin{array}{cc} \mathbf{A} & \mathbf{B} \\ \mathbf{B}^{T} & 0 \end{array}$ $\displaystyle \left.\vphantom{\begin{array}{cc} \mathbf{A} & \mathbf{B} \\ \mathbf{B}^{T} & 0 \end{array} }\right)$ $\displaystyle \left(\vphantom{\begin{array}{c} \mathbf{x} \\ \mathbf{\lambda} \end{array} }\right.$ $\displaystyle \begin{array}{c} \mathbf{x} \\ \mathbf{\lambda} \end{array}$ $\displaystyle \left.\vphantom{\begin{array}{c} \mathbf{x} \\ \mathbf{\lambda} \end{array} }\right)$ = $\displaystyle \left(\vphantom{ \begin{array}{c} \mathbf{L} \\ 0 \end{array}}\right.$ $\displaystyle \begin{array}{c} \mathbf{L} \\ 0 \end{array}$ $\displaystyle \left.\vphantom{ \begin{array}{c} \mathbf{L} \\ 0 \end{array}}\right)$

(35)

Unknown constraints

The routines to handle this situation use a method based on ``Pseudo-inverse berekening en Cholesky factorisatie'', Hans van der Marel, 27 maart 1990, Faculty of Geodesy, Delft University. I will briefly describe Hans' paper, together with the additions I have made.

In the case we are considering, the normal equation A has not full column rank. Therefore, there exists, if the rank defect is p, an n x p matrix G, a basis for the null-space of A, with AG = 0. If we assume that B has just sufficient constraint equations to solve the rank defect, the inverse of the matrix in (35) is:

$\displaystyle \left(\vphantom{\begin{array}{cc} \mathbf{A} & \mathbf{B} \\ \mathbf{B}^{T} & 0 \end{array} }\right.$ $\displaystyle \begin{array}{cc} \mathbf{A} & \mathbf{B} \\ \mathbf{B}^{T} & 0 \end{array}$ $\displaystyle \left.\vphantom{\begin{array}{cc} \mathbf{A} & \mathbf{B} \\ \mathbf{B}^{T} & 0 \end{array} }\right)^{-1}$ = $\displaystyle \left(\vphantom{\begin{array}{cc} \mathbf{A}^{\char93 } & \math... ...ft(\mathbf{G}^{T}\mathbf{B}\right)}^{-1}\mathbf{G}^{T} & 0 \end{array}}\right.$ $\displaystyle \begin{array}{cc} \mathbf{A}^{\char93 } & \mathbf{G} {\left(\ma... ... {\left(\mathbf{G}^{T}\mathbf{B}\right)}^{-1}\mathbf{G}^{T} & 0 \end{array}$ $\displaystyle \left.\vphantom{\begin{array}{cc} \mathbf{A}^{\char93 } & \math... ...ft(\mathbf{G}^{T}\mathbf{B}\right)}^{-1}\mathbf{G}^{T} & 0 \end{array}}\right)$

(36)

with A^# the pseudo-inverse of A. A number of relations hold for A^#, the most important for us that B^TA^# = 0 and A^#B = 0, or B is a base for the null-space of A^#. For a mininorm solution we can choose B = G. In that case A^# is the Moore-Penrose inverse, and B^TG is regular and symmetric.

We can now proceed in the following way:

1.

Do a Cholesky factorisation of the normal equations A until the remaining columns of A are dependent. Pivoting along the diagonal of A occurs to make sure that A can be partitioned in an independent and a dependent part with rank defect n₂. Dependency is determined by looking at the angle between a column and the space formed by the already determined independent space (the so-called `collinearity number').

If n₁ = n - n₂, we can say that after n₁ Choleski steps, we have:

A = $\displaystyle \left(\vphantom{ \begin{array}{cc} A_{11} & A_{12} \\ A_{21} & A_{22} \end{array} }\right.$ $\displaystyle \begin{array}{cc} A_{11} & A_{12} \\ A_{21} & A_{22} \end{array}$ $\displaystyle \left.\vphantom{ \begin{array}{cc} A_{11} & A_{12} \\ A_{21} & A_{22} \end{array} }\right)$ = $\displaystyle \left(\vphantom{ \begin{array}{cc} U_{11}^{T} & 0 \\ U_{12}^{T} & I \end{array} }\right.$ $\displaystyle \begin{array}{cc} U_{11}^{T} & 0 \\ U_{12}^{T} & I \end{array}$ $\displaystyle \left.\vphantom{ \begin{array}{cc} U_{11}^{T} & 0 \\ U_{12}^{T} & I \end{array} }\right)$ ^. $\displaystyle \left(\vphantom{ \begin{array}{cc} U_{11} & U_{12} \\ 0 & \overline{A}_{22} \end{array} }\right.$ $\displaystyle \begin{array}{cc} U_{11} & U_{12} \\ 0 & \overline{A}_{22} \end{array}$ $\displaystyle \left.\vphantom{ \begin{array}{cc} U_{11} & U_{12} \\ 0 & \overline{A}_{22} \end{array} }\right)$

(37)

with U₁₁ the Cholesky factor of A₁₁, U₁₂ = U₁₁^T-1A₁₂ and $\overline{A}_{22}^{}$ = A₂₂ - U₁₂^TU₁₂.

The collinearity angle $\delta$ can be determined for the current column i by: sin² $\delta$ = u_ii²/a_ii.

2.

Determine a G₁ replacing the (rectangular) U₁₂ from U₁₁G₁ = - U₁₂.

3.

Determine a (symmetric) G₂ replacing A₂₂ from the rank basis I + G₁^TG₁ = (G₂^T - 1)(G₂ - 1)

4.

determine constraint equations G = $\left(\vphantom{ G_{1}G_{2},G_{2}}\right.$ G₁G₂, G₂ $\left.\vphantom{ G_{1}G_{2},G_{2}}\right)^{T}$ .

5.

Solve for the n₁ independent unknowns x₁ using A₁₁ and L₁ by back substitution

6.

Solve constraint equations for the remaining n₂ unknowns, using G₂ by back substitution: x₂ = - G₂^-1G₁^Tx₁

7.

Make Baarda's S-transform of independent n₁ unknowns: x₁ = x₁ + G₁x₂.

Setting F = A₁^-1L₁, D = - G₂^-1G₁^T and E = $\left(\vphantom{\begin{array}{c}\mathbf{I}_{n_{1}}+G_{1}\mathbf{D} \\ \mathbf{D} \end{array}}\right.$ $\begin{array}{c}\mathbf{I}_{n_{1}}+G_{1}\mathbf{D} \\ \mathbf{D} \end{array}$ $\left.\vphantom{\begin{array}{c}\mathbf{I}_{n_{1}}+G_{1}\mathbf{D} \\ \mathbf{D} \end{array}}\right)$ , we can write the above steps as:

$\displaystyle \left(\vphantom{\begin{array}{c} \mathbf{x}_{1} \\ \mathbf{x}_{2} \end{array}}\right.$ $\displaystyle \begin{array}{c} \mathbf{x}_{1} \\ \mathbf{x}_{2} \end{array}$ $\displaystyle \left.\vphantom{\begin{array}{c} \mathbf{x}_{1} \\ \mathbf{x}_{2} \end{array}}\right)$ = $\displaystyle \left(\vphantom{\begin{array}{c} \mathbf{I}_{n_{1}}+G_{1}\mathbf{D} \\ \mathbf{D} \end{array}}\right.$ $\displaystyle \begin{array}{c} \mathbf{I}_{n_{1}}+G_{1}\mathbf{D} \\ \mathbf{D} \end{array}$ $\displaystyle \left.\vphantom{\begin{array}{c} \mathbf{I}_{n_{1}}+G_{1}\mathbf{D} \\ \mathbf{D} \end{array}}\right)$ $\displaystyle \left(\vphantom{\begin{array}{c} A_{1}^{-1}L_{1} \end{array}}\right.$ $\displaystyle \begin{array}{c} A_{1}^{-1}L_{1} \end{array}$ $\displaystyle \left.\vphantom{\begin{array}{c} A_{1}^{-1}L_{1} \end{array}}\right)$ = E^.F

(38)

Errors

Errors are determined in the same way as described in (2.1.1), where it should be noted that (13) can either be used summing over n₁ variables and using the first guess of x₁, or over n and using the final x. Internally the first option is used. The uncertainties in x are determined by calculating the covariance matrix. Using similar arithmetic as in (2.1.1), it can be shown that:

A^-1 = E^.a₁^{-1 .}E^T

(39)

A^-1 is calculated by first solving H = E $\left(\vphantom{\mathbf{A}_{1}^{-1} \cdot\mathbf{I}_{n_{1}}}\right.$ A₁^{-1 .}I_n₁ $\left.\vphantom{\mathbf{A}_{1}^{-1} \cdot\mathbf{I}_{n_{1}}}\right)$ and then A^-1 = E^.H^T.

Known constraints

In the case of known constraints, hence when B is known in (35), this equation can be solved. Cholesky decomposition does not work in this case, and a, transparent, Crout LU decomposition is done to determine the (symmetric) covariance (i.e. inverse) matrix of the left hand side of (35). .

Errors

Errors are determined as explained in (2.1.1). Since all constraint equations are $\equiv$ 0, the sum in (13) is taken over n, not over n + p if p are the number of constraint equations.

Non-linear equations

In the case of non-linear condition equations, no simple solution to minimise the merit function (e.g. (6)) exists. A simple solution is to take the condition equation, make a first order Taylor expansion around an estimated $\hat{\mathbf{x}}$ ; solve for dx, and iterate till solution found.

However, better and more stable methods exist (e.g. steepest-descend) for some circumstances. A method that seems to be quite stable and using both steepest-descent and Taylor expansion, is the method by Levenberg-Marquardt (see e.g. ``Numerical recipes'', W.H. Press et al., Cambridge University Press).

If we have an estimate for x, we can find a better one by:

$\displaystyle \hat{\mathbf{x}}_{next}^{}$ = $\displaystyle \hat{\mathbf{x}}$ + H^{-1 .} $\displaystyle \left[\vphantom{ -\nabla\chi^{2}(\hat{\mathbf{x}})}\right.$ - $\displaystyle \nabla$ $\displaystyle \chi^{2}_{}$ ( $\displaystyle \hat{\mathbf{x}}$ ) $\displaystyle \left.\vphantom{ -\nabla\chi^{2}(\hat{\mathbf{x}})}\right]$

(40)

where H is the Hessian matrix of $\chi^{2}_{}$ . If our approximation is not good enough, we could use the steepest-descent by calculating:

$\displaystyle \hat{\mathbf{x}}_{next}^{}$ = $\displaystyle \hat{\mathbf{x}}$ - constant^. $\displaystyle \left[\vphantom{ -\nabla\chi^{2}(\hat{\mathbf{x}})}\right.$ - $\displaystyle \nabla$ $\displaystyle \chi^{2}_{}$ ( $\displaystyle \hat{\mathbf{x}}$ ) $\displaystyle \left.\vphantom{ -\nabla\chi^{2}(\hat{\mathbf{x}})}\right]$

(41)

The Hessian matrix H has as elements:

H_ij = $\displaystyle {\frac{\partial^{2}\chi^{2}}{\partial x_{i}\partial x_{j}}}$ = 2 $\displaystyle \sum_{k=0}^{N-1}$ $\displaystyle {\frac{1}{\sigma_{i}^{2}}}$ $\displaystyle \left[\vphantom{ \frac{\partial y_{k}}{\partial x_{i}} \frac{\p... ...}-y_{i}\right) \frac{\partial^{2}y_{i}}{\partial x_{i} \partial x_{j}}}\right.$ $\displaystyle {\frac{\partial y_{k}}{\partial x_{i}}}$ $\displaystyle {\frac{\partial y_{k}}{\partial x_{j}}}$ - $\displaystyle \left(\vphantom{ l_{i}-y_{i}}\right.$ l_i - y_i $\displaystyle \left.\vphantom{ l_{i}-y_{i}}\right)$ $\displaystyle {\frac{\partial^{2}y_{i}}{\partial x_{i} \partial x_{j}}}$ $\displaystyle \left.\vphantom{ \frac{\partial y_{k}}{\partial x_{i}} \frac{\p... ...}-y_{i}\right) \frac{\partial^{2}y_{i}}{\partial x_{i} \partial x_{j}}}\right]$

(42)

By dropping the second term in (42), and multiplying each H_ii term with (1 + $\lambda$ ), and defining the first derivative of $\chi^{2}_{}$ as b, we can combine the equations (40), (41) into:

H^.dx = b

(43)

which can be solved as standard normal equations.

Choosing $\lambda$ is the crux of the matter, and where to stop iterating. A start value of $\lambda$ = 0.001 is used in the following routines. The method looks at the $\chi^{2}_{}$ for $\hat{\mathbf{x}}$ + dx. If it has increased over the value of $\chi^{2}_{}$ for $\hat{\mathbf{x}}$ , $\hat{\mathbf{x}}$ is unchanged, and $\lambda$ = 10 $\lambda$ . If there is a decrease; a new value for $\hat{\mathbf{x}}$ is used, and $\lambda$ = $\lambda$ /10. Iteration can be stopped if the fractional decrease in $\chi^{2}_{}$ is small (say < 0.001); never if there is an increase in $\chi^{2}_{}$ .

Errors

Errors are calculated as described in (2.1.1). The errors returned by WNMLSN are, of course, only approximations, since the original equations are non-linear, but give a good impression of the residuals. The covariance matrix should be calculated by doing a final linear run on the residuals, and solve the equations.

Summary

An LSQ-object takes a total space of:

9	+	(administration)
(n + p)(n + p + 1)/2	+	(normal array)
4m	+	(error calculations)
m(m + p)	+	(known sides)
(n + p + 1)/2	+	(pivot area)
(n + p)		(solution) 8 byte words	(44)

where n is the number of unknowns (2n if complex); m the number of simultaneously to be solved equations; p the number of external constraint equations. In the non-linear case two LSQ-objects are used. In cases where the inverted normal array is calculated (known constraints) a temporary storage of n² 8-byte words is used.

The following calls are available:

1.: To be given

wnb, October 15, 2006