| Version 1.9 Build 1556
|
|
NOTE 255 -
AIPS++ Table System Revisited
Ger van Diepen, ASTRON Dwingeloo
2003 February 3
A (gzipped) postscript version of this note is available (40kB).
The AIPS++ Table System is based on the table model outlined
by Allen Farris in the beginning of 1992. At the start of the AIPS++
project it was decided to develop the Table System as no other package
seemed to support the flexibility wanted.
See the old AIPS++
data base document
for a description of the very first version of the Table System.
Since then the system has been changed and enhanced considerably.
The description of the
Tables
module gives a good overview of the current capabilities.
The Table System is used throughout the AIPS++ project to offer
uniform access to all data. The table browser can be used to view
all kind of tables.
Particular examples of AIPS++ data stored
in tables are the MeasurementSets and the Images.
In the remainder of this document the most important properties of the
Table System are given. Thereafter the strong and weak points are
discussed.
As said above the Table System is the main AIPS++ storage mechanism.
The MeasurementSet
uses tables to
its full extent. It consists of a main table containing the main data.
Several subtables contains additional information like the description
of antennas, feeds, sources, etc..
Usually the TiledShapeStMan tiled storage manager is used to hold the
data in the main table. This storage manager is capable of handling
arrays with varying sizes. A tile is 3-dimensional with axes
correlation, frequency, and rest (meaning time, baseline, fieldid,
etc.). In principle this tiling offers equally fast access when
accessing data in frequency direction or in time or baseline
direction. However, because the rest axis contains both time and
baseline, the access can sometimes be slowish.
An image is stored as a table where the image array is stored using
the tiled storage manager. An image can have zero or more masks. Each
mask is a boolean array stored in a subtable. Furthermore an image has
a log table (for history) which is also stored as a subtable. Other
image information (like coordinates) is stored as keywords.
Images are using the so-called lattices which are memory-based or
disk-based arrays. Lattices can easily be traversed in any order.
The Lattice Expression Language
makes it possible to define expressions from lattices.
Lattices are commonly used in AIPS++. For instance, the autoflagger
uses lattices to assemble statistical information for automatically
flagging MeasurementSet data.
All log information of AIPS++ sessions are stored in the log table.
In general it works fine, but if the log rate is too high the system
cannot cope with it. The main reason is that the log table can be
filled from several processes, so it has to extensively use table
locking for synchronization. It would be better if a single logging
client was running, so the log table had only one process writing into
it.
Other places where tables are used are:
- Calibration tables
- Observation catalog
- Sky catalog (used by viewer)
- Tables used by Measures system (IERS tables, etc.)
- Westerbork TMS system
All standard data types are supported (char, short, integer, float,
double, string).
Also single and double precision complex data types are supported.
A big advantage of the Table System is that it can handle scalars as well
as arrays of all thosetypes.
Another benefit is that Measures can be stored in tables.
The Table System has some specific data (storage) managers:
- The Incremental Storage Manager can save quite some storage by only
writing data when they change. In the main table of the MeasurementSet
it is quite heavily used.
- The Tiled Storage Manager is used to store data in a tiled way.
It makes it possible to access to image along all axes in an equally
fast way (depending on the tile shape). Comparison with a package
like Miriad showed that you pay a little penalty for access in the
X-direction, but it is much, much faster in the Z-direction.
- The ComplexCompress data manager makes it possible to compress single
precision floating point data to short integers to save a factor 2
in storage. It is completely transparant to the application and user,
so one still sees the floating point data.
The ability of safe concurrent access is very nice. Despite its
shortcomings, autolocking makes life very easy, since it frees the
user from all locking issues.
TaQL is a very versatile query language with full support of arrays.
Users highly appreciate it because it offers easy selection of subsets
of a table. Since these subsets are also tables (referencing the
original data), one can easily change the data in a subset of a table.
TaQL can also be used to sort a table or to sort it uniquely.
The support of arrays makes TaQL in a way superior to SQL. Some
selections can be expressed very elegantly in TaQL, while being
tedious in SQL. E.g. selecting baselines 0-1, 1-2, 2-3, 3-5 can be
done in TaQL as
where any(ANTENNA1=[0,1,2,3] && ANTENNA2=[1,2,3,5])
TaQL can operate on data arrays in a table, while SQL can only
deal with scalar data.
Users greatly appreciate the ease of access via glish. Using
the glish functions and tools like TaQL and the table browser it is
very easy to examine and inspect the data and change them when needed.
Many users have written scripts to process the data in glish using
TaQL and the table access functions in glish.
A widget (taqlwidget) exists to form TaQL
commands in a query-by-example style.
Retrieving data from a table is slower than retrieving it from a flat
file. It depends on the type and the shape of the data and on the
storage manager used. Tests show that retrieving an array from a Table
using a Tiled Storage Manager can be about 50% slower than reading it
from a raw file when accessing the data sequentially.
This degradation in performance has to be weighed against the benefits
supported by the Table System as described in the previous section.
Note that measuring IO performance is not as easy as it looks because
the UNIX file system keeps small files in memory (small can be as
large as 100 Mbytes). One should always do an fsync to be sure the
data are flushed to disk.
The main problem seems to be accessing the data in the MeasurentSet
in the calibrater or imager. Apart from the 50% degradation discussed
above, there can be other reasons for it. One is that the data
is usually accessed in a different order than it was stored.
Maybe this could be solved partially by storing the data per spectral
window. Another thing that might be useful is to use another storage
manager than the tiled one.
It is important to note that the TMS system for Westerbork started
with use of Sybase for its data bases. They decided, however, to
switch to the AIPS++ Table System because it performed much better
than Sybase for their particular application.
No comparisons have been made between the Table System and a
data base system like MySQL or PostgreSQL. Doing comparisons may be hard
because performance will usually be highly application dependent.
The storage managers only use simple synchronous I/O, though they use a
cache to buffer often used data.
Non-blocking I/O nor file mapping is used. Especially non-blocking I/O
might improve performance, but it requires extra buffer space.
It might also be needed to provide more functionality for giving hints
about access patterns.
In the early days the Table System was not very robust, but it has
improved over the years.
In occasional circumstances the data in a table can be corrupted.
This could be the case when the system crashes while a critical data
portion is written into the table. However, in practice it
hardly ever happens because especially in the now commonly used
StandardStMan storage manager quite some attention has been paid to
robustness.
The Table System is not a commonly used piece of software as, for
example, MySQL is. It means that only a Glish binding is available.
However, it would be not too much work to make, say, a Python binding.
Table locking can be problematic if not done properly.
Locking can only be done on the entire table, so for full multi user
access a lock should be held as short as possible. On the other hand
releasing a lock means that the buffers needs to be flushed to disk,
so the extra IO involved could mean that one wants to hold the lock
as long as possible.
The AutoLocking mode was invented to solve this problem. However,
it has the drawback that it may take a few seconds before the system
detects that the AutoLock should be released. This is especially the
case in a glish client which might be idle for some time. It is now
possible to define the AutoLock inspect time in aipsrc, so one can set
it as short as needed.
Although TaQL supersedes SQL with its array capabilities, it lacks
several nice SQL features. The most imporatant are:
- joins
- GROUP BY and HAVING
- calculated columns in SELECT
The current Table System shows some shortcomings, especially in the
area of parallel data IO. This area will get more and more important.
Current instruments can already produce lots of data, but future
instruments like ALMA and LOFAR will produce even much more.
Because the I/O in the Table System is done in a separate layer, it
should in principle be possible to enhance it with other forms of
I/O like parallel and networked I/O. However, other high
performance data I/O libraries
exist and it should be studied if it is better to use such a library.
HDF5
is such a
library. It has an C++ and Java interface and a viewer written in
Java. It supports array tiling (chunking in their terminology).
Before deciding on how to proceed with a data system the requirements
should be clear.
It is clear that (near) future systems have to process observations
possibly consisting of several terabytes of data. Full support of
parallel processing and I/O are needed, but it is the question how.
The optimal way of parallel processing is that the processing is done
on the host where the data are. Thus a client should only ask for the
result and not for the data themselves. Of course, this is not always
possible. It should also be possible for a client to get file data
from another node, thus parallel and networked I/O is needed.
Parallel I/O will be very important when data are stored in a SAN.
An example might be an image of 4000*4000*4000 float pixels.
Such an image might be tiled and spread over several nodes.
Many operations (like finding minimum/maximum, adding images) can be
done locally on those nodes. So the data system and processing system
have to collaborate for optimal performance.
Another example is the selfcal of a very large MeasurementSet. The
predict can be done on the node where the data resides and only the
residuals and derivatives are sent to the solver. Sometimes, it might
even be possible that a solve is done locally.
Data bases are not useful for such data. They lack the ability to deal
with large arrays of data or to access chunks of them.
Several other tables (log tables, observation catalogs) can be handled
with standard
data base technology. Public domain packages like MySQL or PostgreSQL
should be evaluated. It might be needed to add special indices
to deal with queries like a conus in the sky, although it is expected
that such functionality already exists.
In principle it is possible to make a storage manager in the Table
System that uses a data base system. It is the question whether that
is worthwhile. The advantage would be that TaQL could still be
used to have support for data arrays in queries.
Some institutes (e.g. Westerbork) have archived MeasurementSets.
They are archived in ms2archive format. When support of the
Table System is stopped, a tool should be provided to retrieve
these archived MeasurementSets in the new data format.
Please send questions or comments about AIPS++ to aips2-request@nrao.edu.
Copyright © 1995-2000 Associated Universities Inc.,
Washington, D.C.
Return to AIPS++ Home Page
2006-10-15