Measurement and Telescope Data Sets and Application Objects =========================================================== for Single-Dish Work and Interferometry ======================================= (AIPS++ Implementation Note #152) Authors/Participants -------------------- Dave Shone, Brian Glendenning, Bob Hjellming, Gareth Hunt, Bob Payne, Russell Redman, Ger van Diepen, Allen Farris, Bob Garwood & Ron Maddalena 1 Introduction and Background ------------------------------- A series of meetings held in Charlottesville and Green Bank in November 1992 considered the definition of the contents of a Measurement Set and associated Telescope Model Data for radio single dish and interferometer data. This note is based on the conclusion of those meetings, but also contains a small amount of new material. Apart from recording this work, it is hoped that this note will be useful in stimulating feedback from those who create the "data sources", and as a guide to further protyping and implementation. We have concentrated refining and extending the data modelling aspects of the principal objects involved in calibration, with particular interest in incorporating single dish analysis. The objects of interest are the Telescope Model and Measurement Set (formerly known as YegSet), both of which are described in the Project Book. We regard these as logically appearing as pure tables, and our objective has been to determine what columns are required for single dish and interferometric radio astronomy. Note that these are logical columns or coordinate-keys ("CoKeys"), and this does not necessarily reflect the way in which the data are actually stored, although it is likely that the actual implementation will be based on the extended FITS tables described in the Project Book. The data models described are subject to further revision based on feeback from the detailed analysis of the requirements of particular intruments. Further revision of this material will appear in the project book, and/or cast in stone as a memo. In order to set the direction for implementation and prototyping, some applications objects suitable for prototyping are proposed. 1.1 Minimalist Revisionism -------------------------- Previous attempts to define the distinction between measurement data and Telescope model data have encountered the problem that some coordinate-keys in a Measurement Set are derived from data which clearly belong in the Telescope Model. An example is the baseline vector (u,v,w) in interferometry; this is dependent on the positions of antennas, and these may be adjusted, requiring a recalibration of (u,v,w). We have decided to adopt a "minimalist" approach to coordinate keys in the Measurement data set; only coordinates which are required to uniquely identify a measurement will be stored in the Measurement Set. Other relations may be produced by effectively joining the Measurement and Telescope Model data tables. The distinction we propose leads to a clearer division of data between Telescope Model and Measurement Set, but many operations will be complicated by having to select data from two data tables. However, this is not likely to be a serious problem, since operations which often require such selections will usually be hidden in an application object. Interferometry again provides an example, where (u,v,w) will be implemented as an application object which performs the appropriate data selection, and calculates "on-the-fly", if necessary (in fact this might be a service of an interferometer instrument component of the Telescope Model). Alternatively, (u,v,w) columns might be be created temporarily (this is a good example of the the utility of being able to store columns separately). In the interferometer case, applications will typically work with visibilities or other aggregates such as integrations or baselines; these will almost certainly have services to provide (u,v,w). In addition, generic table operations will always be possible on the Mesasurement and Telescope Model data, so a "joined" data set may be formed if necessary. 1.2 Implications for Tables --------------------------- If we follow our original definition of a Measurement, in which it is atomic in the sense that it represents the smallest element of measurement which has a physical meaning, then the Measurement is an element of a spectrum rather than a complete spectrum. This appears somewhat alien (not to mention potentially inefficient, on the face of it), but is important in order to maintain a consistent mechanism for the selection and aggregation of data; a generalised spectrum might not alway consist of intensities regularly sampled in frequency, and thus it must be possible to treat frequency in the same way as any other coordinate. In practice, a spectral Measurement Set will probably treat frequency as an implicit coordinate (a regular axis), and thus avoid the inefficiency associated with the general case, whilst retaining the same database interface. In a similar fashion, some data which are traditionally considered as "header data" (e.g., the observer identity) are logically expanded in a Measurement Set, so that such data appear to be present in each Measurement. In practice, these will be implemented as virtual columns, thereby economising on storage requirements, whilst allowing the flexibility of storing these as real columns when necessary. Other traditional header data will appear in the Telescope Model, rather than being included with the measurement data in the Measurement Set. 2.0 Data Model Definitions -------------------------- 2.1 Measurement and CoKeys for Interferometer Data -------------------------------------------------- Complex Visibility Measurement - the amplitude/phase or Real/Imaginary values for a single point in the observation. Data Quality Measures - Essentially weights and flags, although application-specific measures are also possible. Time Polarisation - An enumerated type describing the polarisation of a measurement, e.g., one of I, Q, U, V or RR, LL, RL, LR. Frequency channel - Used together with IF to index information frequency information in the Telescope data IF - Used as index to receptor set in Telescope data, and together with Frequency channel, to index frequency information in the Telescope data. (Antenna1,Antenna2) - Uniquely determines antennae and associated correlator channel in the Telescope data. UserTags - Probably similar in function to Data Quality Measures; maybe we should lump these together, although the formal error in a measurement should probably be distinguished. 2.2 Measurement and CoKeys for Single Dish Data ----------------------------------------------- Intensity Measurement Data Quality Measures - Essentially weights and flags, although application-specific measures are also possible. Time Polarisation - An enumerated type describing the polarisation of a measurement, e.g., one of I, Q, U, V or RR, LL, RL, LR. Frequency channel - Used together with IF to index information frequency information in the Telescope data IF - Used as index to receptor set in Telescope data, and together with Frequency channel, to index frequency information in the Telescope data. IntegrationPhase - An enumerated (and usually implicit) coordinate which describes the integration/calibration phase with values "on", "off", "dark" or " ". UserTags - Probably similar in function to Data Quality Measures; maybe we should lump these together, although the formal error in a measurement should probably be distinguished. Initially, it was unclear whether the Integration Phase should be implemented as a single coordinate with enumerated values (as described above) or as a different measurement values within a single Measurement. If we take the view that any one of these values is meaningless without the others, then the latter approach is required to make an appropriate atomic Measurement. However, in some circumstances it may be necessary to change groupings of Integration Phase (i.e. the astronomer may wish to use a different "OFF" phase for one or more "ON" phases), which supports the former approach. This case is analogous to calibration in interferometry, where different Integration Phases correspond to different sources, target and calibrator. For these reasons, we favour the approach where calibrator phase is a coordinate, particulary since it follows our general philosophy of maintaining a common means of selecting subsets of data. Additional calibrator phases could be introduced simply as new Integration Phase values, eliminating the need to change the Measurement definition, as would be necessary otherwise. 2.3 Radio Telescope Model Data ------------------------------ We envisage that radio telescopes will usually be sufficiently similar that a minimal core can be specified. Of course, this does not define the internal workings of a given Radio Telescope Model, but rather a minimal set of services which must be provided. A specific Telescope Model may also provide additional information, perhaps for a given site or application. The Radio Telescope Model is likely to breakdown into a number of components as described in the project book, although this is not always necessary or desirable. The important point is that the interface presented to the outside world has the common core. 2.3.1 Receptor -------------- Gain - complex for interferometer Tsys Tcal Residual Delay Residual Rate 2.3.2 Instrument ---------------- In the case of an interferometer, this will contain baseline/correlator information (e.g., gains which are factorised to individual receptors. Correlator gain (Ant1, Ant2, IF, Time) Delay Rate Frequency Channel Width Frequency Channel Separation Bandwidth 2.3.3 Telescope Element ----------------------- Location - relative to platform Projected Plane Coordinate Sky Position - antenna pointing position Mount Type Axis Offsets Gain(SkyPos) - generalised form of Gain versus Elevation Primary Beam Gain - generally two-dimensional SubArray identifier 2.3.4 Platform -------------- Reference System - coordinates reference system descriptor/parameters Time System - time system descriptor/paramerers Pole Location Earth rotation rate Orbital Parameters - for orbiting platforms 2.3.5 Environment ----------------- Zenith opacity Weather - this is likely to be observatory specific. 2.4 Implementation Issues ------------------------- 2.4.1 Regular and irregular data arrays as implicit and explicit coordinates ---------------------------------------------------------------------------- We have already suggested that in the usual case of a regular spectrum, the frequency coordinate will be implemented as an implicit coordinate. In many cases, other coordinates may also be implemented in this way. Typically, calibration switching takes place in a cyclic fashion, and thus the integration phase may also be implemented as an implicit coordinate. In the case of the UniPOPS single dish data format, the implicit data coordinates and the dimensions of the appropriate data array will be determined by the observing mode parameters; see the definition in the UniPOPS reference manual, in particular the observing parameters in class 3 and the phase block in class 11. (This is specific to Green Bank/Tucson; we should eventually generalise by reconciling this with Rick Fisher's note of 11 Nov 1992 and specifications from other instruments, e.g., the JCMT). 2.4.2 Time, Frequency, Location and Position classes and reference systems -------------------------------------------------------------------------- Time, Frequency, Location and Position quantities should all be defined as classes, in order that the appropriate reference system and it's parameters may be encapsulated, and transformations from one system to another may be performed transparently. In the current data model, the Telescope Model Platform component provides a description of the reference systems used. This may be deemed unnecessary if such things are incorporated in the aforementioned quantity classes. However, it may prove desirable to allow these quantities to be calibrated or recalibrated, e.g., UT1-UTC might be adjusted, and therefore it would be useful for the parameters of a particular reference system for a particular observation to be accessible in one place, and the Telescope Model (probably the Platform component) seems the most sensible place. Of course, these parameters would be accessible through the coordinate classes, but in this case, these would refer back to an associated Telescope Model. 3.0 Some Application Objects for Prototyping -------------------------------------------- Most applications will not operate directly on a Measurement Set; rather, they will use application-oriented objects which manage the selection and aggregation of data from a Measurement Set. These objects embody data structures which are generally more "astrophysically meaningful" than implementation structures such as tables and arrays. In additions to the data structures, application objects have methods which operate on these data structure, with semantics appropriate to these types. Prototyping is necessary to test several aspects of this system design: * The basic data system architecture and interfaces - essentially a kind of object-oriented database sitting on top of something which more closely resembles a relational database. * The low-level data model - the contents of the Measurement and Telescope Data tables and the division of data among these. * Use by applications - definition of an appropriate set of application objects. * Efficiency. Probably the best way to test all of these is by designing, implementing and using some application objects to use the prototype low-level interface. We envisage several such objects which are suitable for early prototyping, and we describe them here in the order they should be attempted. 3.1 Spectrum - The principal application object for Spectral-line work ---------------------------------------------------------------------- This was selected as a simple class for initial prototyping, and at the time of writing, work is underway to code this. 3.1.1 Construction ------------------ Two constructor methods are proposed for the prototype: * A method to construct a spectrum from an underlying table of data. This will select data according to criteria specified to the constructor, and perform gridding or regridding where necessary. * A method to construct a spectrum from a vector of intensity values, together with channel description information. This should eventually be extended to write data to a table with an implicit frequency Coordinate/Key, in order to provide a higher-level means of filling a spectrum measurement set in the case of regularly sampled data, or for a related copy constructor. 3.1.2 Attributes ---------------- * A service which returns a reference to a vector of intensities. * A service which returns a vector of frequencies to support the general case of irregulary sample data. * A service which returns a vector of velocities * A spectrum may have an optional Integration Phase value, applicable to the entire spectrum - a spectrum in which the different elements have different Integration Phase values does not appear to be meaningful. However, if this is not present or is set to "none", then the spectrum may have been formed by averaging or some other mathematical combination of different Integration Phases, e.g., a calibrated spectrum could be formed by calculating (on-off/off-dark) in the constructor or "on-the-fly" when an element is accessed. In addition, a service should also be available which provides a table with containing any of the attributes - essentially the subset of the original Measurement Set from which the Spectrum is formed. 3.1.3 Methods ------------- In addition to the usual mathematical methods associated with arrays: * Addition - e.g., to support the case of addition of irregulary sampled measurements. 3.1.4 Implementation Notes -------------------------- It is not immediately obvious whether a spectrum should be implemented by derivation from a vector class, or as an object which has vector attributes. The former provides for a slightly neater way of using spectra in some cases (e.g., a spectrum, rather than one of its attributes could always be used where a vector might be use). However, a spectrum has so many additional properties (including a number of attributes which are vectors) that the second approach seems preferable. It is possible that we might ultimately find it useful to derive spectrum (and other application objects) from a purely mathematical series class. 3.2 The Visibility and other Interferometric Application Objects ---------------------------------------------------------------- Although we did not discuss these in any detail in the CV/GB meeting, some suggestions/guidelines are proposed here. The minimalist revision of the low-level data model has placed additional responsibilities on some application objects. Our clearest example of this is in interferometry, where (u,v,w) no longer have an obvious place, since they are derived from the Telescope Model for each measurement. The Visibility will have to be able to provide a service to return a (u,v,w) attribute, and in the absence of any other object willing to take responsibility, this will have to be calculated by the visibility (this is still somewhat of a moot point; it might be sensibly be provided by the Telescope model). For the time being, we should assume the following: * The visibility will always have a service which provides the (u,v,w) attribute; * In the absence of a "(u,v,w) column" in the Telescope Model (which might be provided by some component thereof), the visibility will calculate (u,v,w) using the appropriate information from the Measurement and Telescope Data (this may or may not be "cached") according to an agreed "default prescription" * Some Telescope Model builders may prefer to provide (u,v,w) calculated according to their own ideas and beliefs. 3.2.1 Construction ------------------ Construction from an existing database will be done by selecting an appropriate time, Antenna pair etc., This class (and the aggregates described later also offer a higher level filler interface than direct filling of the low-level table interface. 3.2.2 Other Visibility Attributes --------------------------------- In addition to (u,v,w), the visibility class will have service to provide: * a complete polarised (I,Q,U,V,RR,LL,RL,LR) spectral visibility (we could provide calibrated and uncalibrated interfaces) * antenna pair; * IF. 3.3 Other Interferometric Application Objects --------------------------------------------- In order to "hide" the detail of data selection, we should probably provide some objects which are aggregates of Visibilities, in particular Integration (a selection of all visibilities in some narrow time range), Baseline data (all visibilities for a particular baseline) and Spoke (all visibilities in some narrow sector of the u,v plane). The implementation of these is an issue for further discussion and prototyping, but the following starting point is proposed: * Each of these should have a service to return a Vector of Visibilities. * Attributes of individual visibilities might also be usefully provided in Vector form, e.g., vectors of Amplitude, Phase, u, v, w.