Getting Started | Documentation | Glish | Learn More | Programming | Contact Us |
![]() | Version 1.9 Build 1367 |
|
From the point of view of the mathematics, there are six basic data types that are potential elements of all data objects:
The first four types of data values (numeric, complex, logical, and strings) represent four ``modes''. Additional modes can be defined for higher level constructs. The use of fuzzy numbers that is being investigated for error propagation in AIPS++ will use constucts based upon vectors of numeric or complex type, with special rules for their arithmetic, but will not be discussed further in this document.
The use of NA and NULL at the basic level of data values seems to allow added flexibility in mathematical algorithms, with some methods using, or allowing, these data values, and some methods requiring that NA or NULL not be included in the data objects.
Now we can define a data object as an atomic or non-atomic collection of abovementioned data values. In atomic data objects all data values are of the same mode (numeric, complex, logical, and strings), whereas in non-atomic data objects there are mixtures of atomic data objects with different modes.
The basic data objects are defined by their attributes, with length, mode, dim (dimension), dimname, and class being most fundamental. The following table summarizes the basic mathematical data objects, their attributes, and their role.
Table 1 - Data Objects Class Atomic Attributes Role Mvector T Length Mode Dim Dimname Most basic data object Mmatrix T Length Mode Dims Dimnames Rows/columns of vectors Marray T Length Mode Dims Dimnames N-dimensional array Mlist F Length Mode Names Ordered collection of data objects Mtable F Length Mode Names Row.Names Generalized table with columns of numeric, logical or character data values Factor F Length Mode Names Levels Qualitative identification and labeling of data Grid F Length Mode Dims Dimnames Coords N-dimensional array with even axis intervals
The first three data objects in Table 1 are augmentations of the classes already developed for AIPS++, with the specific addition of Dimname attributes for each dimension. These are named mvector, mmatrix, and marray to distinguish them from the AIPS++ classes that have already been implemented; dimnames allow a simple assignment expressions between these data objects and mtables, and allow all data objects to have vectorized selection/logic based upon key words.
The mtable data object is a specific view, or form, of an AIPS++ table that allows one to easily compose and decompose it from/to other data objects using methods related to mvector, mmatrix, marray, and other mtable data objects. As with an AIPS++ table, an mtable can contain columns of any of the other atomic data objects.
While many of these data objects will be small enough to fit into memory, with many cases of interest that will not be true; however, the use of buffered I/O is a prime example of an implementation detail that should be hidden as part of the general data base manager for objects of all kinds.
The mlist data objects are associations of other data object components that are formed by a mlist(o1, o2, ..., oN) method, and which have a syntax allowing mathematical operations on component and sub-component data objects. The mlist data object allows simple association of related data objects resulting from methods or more complicated multi-object algorithms, without requiring construction of new kinds of data objects, since they are just different mlists of standard data objects. For example, one can map a AIPS FITS image into an image object with the method mlist(labels=labelvector,values=valuevector,axes=axesmatrix, pixels=imagearray) where labels is a vector of string-like header information, values a vector of the global numeric information for the image, axes is a matrix of numbers describing the values (and state) of the image coordinates, and pixels is the array of numbers which contain the image values. Because of the dimnames attributes of vectors and arrays, the keyword=value syntax of FITS images maps directly into vector, matrix, and array data objects. The data object components of observations, measurement sets, telescope models, etc., can be formed, referred to, and operated on with a combination of the syntax of mlist and mtable methods. The following is an example of a listing of contents of an mlist image data object derived from AIPS/FITS:
labels: NAME OBJECT "NCYG92" TELESCOP "VLA" INSTRUME "VLA" OBSERVER "R.M.HJELLMING" UNITS "JY/BEAM" values: VALUE NAXIS 4.00E+00 EPOCH 1.95E+03 SCALE 2.00E-04 OFFSET 0.00E+00 BLANK 0.00E+00 axes: RA.SIN.DEG DEC.SIN.DEG FREQ.HZ DIM 1.750000E+02 1.750000E+02 1.00000E+00 CRBLC 1.600000E+02 1.600000E+02 1.00000E+00 CRVAL 3.072800E+02 5.246208E+01 2.24851E+10 CRINC -6.944444E-07 6.944444E-07 -5.00000E+07 CRREF -2.830000E+00 0.000000E+00 0.00000E+00 pixels: [matrix of numbers]
Flexible mlist construction may be more of a UI-related operation because of the difficulties of implementation in C++.
Factor data objects allow useful identification of qualitative descriptions of data that can be utilized by logical operations in array-oriented or vectorizable algorithms. Each is essentially a vector of integers identifying levels, with an associated vector of names for each level. Constructs like this can be used for many things, e.g. data quality identification, weights, source/field identification, and so on. Factors are concrete classes that are a bridge between numeric arrays and keyword identification used in vectorized operations. An example of a factor object is the following, where the data in the object is a vector of integers identifying different ``levels'' in the object, and levels is a vector of strings indicating that each level identifies a source name: values: 3,3,3,3,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,1,1,1,1,1
levels: ``2005+403'',``NCYG92'',``3C286'' where this vector matches, say, a row of source data so one can operate on the source data using selection methods attached to the factor data object.
The grid data object is one of the most interesting, and a major component of many applications, while retaining entirely mathematical characteristics and methods. In the astronomical context it has the mathematical essence of a time series, a spectrum, a gridded u-v array, and a regular image. In AIPS++ the work on GridTool and FFTtool has developed some of the methods for grid data objects. As a higher level component of the data objects in Table 1, grid is a mvector, matrix, or array, with the added attribute of coord = (begin, end, interval, units), and methods for extracting selected coordinate information and scaling coordinate values. Methods of the grid data objects will use linear equations involving the elements of the coord attribute and the range of indices for each axis.
It is possible that units for pixels should be additional grid attributes, since they affect the scales of stored numeric values, but it is clear that other astronomical content, like representation, reference frame, measure, etc., belongs to higher level classes. The dimnames attribute of grid objects allows keyword identification of each dimension. This may be sufficient for including units since one can use dimnames with labels like ``u.nanosecond''. Later we will discuss a more extensive list of methods for grid data objects.
The evolution of the design and implementation of the framework of classes for images is where further isolation of a possible grid class should be examined, ensuring that it has methods that are generically mathematical and independent of the image-handling problem.
Reinforcing the view that a mathematically defined grid data object is a powerful construct for algorithms, it is possible to use FFT and related methods in S /S+ to produce, and operate on, their matrix and time series objects to represent, and make transformations between, gridded u-v data and images. The emphasis on optimizing for mathematical operations wherever possible leads to methods like ifelse(logicalexpr, expr1, expr2) where a logical operation on an atomic data object results in choice between two expressions, expr1 and expr2, for the element-by-element operation involving that object, depending upon the result of the logical operation for each element. This type of mathematical operation (and the analogous if, switch, all, and any methods, cf. Table 2) is part of the reason for both factor data objects and the association of dimnames with vectors, matrices, and arrays, since names as keywords can then be used in logical operations. The current work on masked arrays is related to the development of these sorts of vectorize logic operations.
Formation of more complicated data objects from simpler ones, and vice versa, should be possible with simple syntax that hides the vector, matrix, etc., nature of the objects. Vector data objects can be formed by a sequence method, a repetiton method, or a combine(o1, ..., oN) method. Vectors should be formable into matrices with methods like rbind(o1, ..., oN) (for rows) and cbind(o1, ..., oN) (for columns). All data objects should be formable into mlist data objects by the mlist (o1, ..., oN) method, and atomic data objects formed into mtable objects with an mtable (o1, ..., oN) method. When implemented in C+ the first argument for each methods will be the number of elements in each object list. The reverse extraction of simpler data objects from more complicated ones is a question of extraction based upon some multi-level ``subscript'' notation.
Testing and coercion are useful concepts for mathematical handling of different, but relatable, data objects. Each data object can have a is.objecttype(object) method that tests for what is needed in some mathematical expression or algorithm, returning TRUE or FALSE, and a as.objecttype(object) method then returns a different type of data object that can be formed from the input object.. The testing method is useful in algorithms. The coercion method aids extraction of one type of data object (e.g. mmatrix) from others (e.g. mtable).
The need for using arbitrary formulas or equations in the fitting or modeling of data is obvious, and a construct that could be useful for supplying formulas to methods, producing strings in the form of formulas in labels, and doing some symbolic algebra and symbolic evaluation, is the formula class. As a data object it takes as input string of characters identifying arithmetic operations, variable names, and parameters to be determined. It has basic symbolic algebra methods like substitute, parse, expression, derivative, evaluate, etc., that allow mathematical expression, decomposition, manipulation, and use in evaluation of symbolic expressions to return values for the quantities modeled by the formula(s).
All the ordinary operator-like operations involving vectors and matrices are assumed to be present, with * used for element by element multiplication as done with the AIPS++ Array class. Using a crossprod method for M x V and M x M with M and V (or M) as arguments, is reasonable. However, in a mathematical system there are distinctions based upon whether the vector or matrix is a transpose or not that can be checked at run time based upon a transpose of non-transpose identification, or left as a potential programmer error. Probably it is best to expect the application programmer to write tran(V) or tran(M) when mathematically required.
Table 2 lists methods for various atomic data objects, with V, M, A, and G indicating whether they apply to mvector, mmatrix, marray, and/or grid data objects, and N, C, and/or L indicating whether they are applicable to numeric, character, and/or logical data values.
Table 2 Methods of Atomic, Numeric Data Objects combine V NC combine mlist of numbers into a vector rep V NC form vector replicating mlist of numbers sequence V N form vector from vstart to vstop using optional step or length parameters tran VMAG N tranpose diag V N from diagonal matrix with input vector on diagonal rbind V NCL from matrix from mlist of vector objects with each vector becoming a row cbind V NCL from matrix from list of vector objects with each vector becoming a column sort V NC sort vector on elements reverse V NC reverse elements of vector (often after sorting) order V NC return integer vector containing the permutation that will sort teh input into ascending order rank V N returns a vector with ranks of the input vector diff VMAG N returns a VMAG with the differences between adjacent elements of the input data object unique V NC returns an object like the input but with repeated values deleted duplicated V NC returns a vector of logical values for an input object indicating whether elements are duplicated or not sum VMAG N returns the sum of all elements of input object prod V N returns the product of all elements of input object max VMAG N returns largest value in input object min VMAG N returns smallest value in input object range VMAG N returns vector of smallest and largest values all VMAG L returns TRUE if all elements of input logical expression(s) are TRUE, returns FALSE otherwise any VMAG L evaluates to TRUE if any elements of input logical expression are TRUE, returns FALSE otherwise if VMAG L evaluate an expression for each element if a logical expression for each element is true ifelse VMAG L depending upon logical expression evaluation for each element, performs onee of two operations on or with each element switch VMAG N depending upon the integer returned by an expression one of a series of expressions is used to used to return a value for each element of the input data object apply VMAG N Apply a function defined by a formula object to all elements of the input data object outer VMAG N Apply a function defined by a formula object to two input data objects with the same shape mean VMAG N returns mean of all elements of data object, optional trim parameter specifying range of values to be averaged median VMAG N returns median of all elements of data object, optional trim parameter specifying range of values to be considered quantile VMAG N returns vector of desired probability levels for a data object, as determined by optional input vector of desired probabilities var VM N returns variance of data object (for optionally specified range of values); if a matrix, columns represent variables and rows represent measurements cor M N return correlation matrix for optional range of values cov M N return covariance matrix for optional range of values round VMAG N return for each element the integer above or below value + 0.5 signif VMAG N return for each element a value with rounding in the specified significant figure cumsum VMAG N returns an object for which each element is the sum of all elements to that point cumprod VMAG N returns an object for which each element is the product of all elements to that point distrib VMAG N returns for each element a value of a named probability distribution over an option range of values fft VMAG N transform a real or complex data object by a direct or inverse FFT autocorr VMAG N return autocorrelation function of data object lag VMAG N return same object with data lagged by specified intervals in one for one or more dimensions (mainly for case of a time-like dimension) convolve VMAG N convolve a function specified by a formula object with a specified span producing a smoothed version of the original data object aggregate VMAG n convolve, average, or smooth one or more dimensions to a data object with a reduced number of data points spanning the same range subset VMAG N return a subsection of a data object based upon a range specification for each dimension coord G N return coordinate values for specified elements
It is obvious from many of these methods that this puts considerable emphasis on vectorizable operations so one can express mathematics with operations that are accomplished as efficiently as possible by internal mechanisms hidden from the programmer.
The basic constructor methods (mvector, mmatrix, grid, etc.) have obvious use and syntax, and some of the operations in Table 2 reflect other ways of constructing these objects. The is.class for testing and as.class methods for testing classes and coercing classes important for specific use and decomposition of mathematical objects. Methods like assigndim and assigndimnames are needed as part of of the composition of data objects.