| Version 1.9 Build 1556
|
|
Next: Some further remarks
Up: NOTE 199 - Table Query Language
Previous: Selection from a table
Subsections
Expressions
An arbitrary expression can be used in the WHERE clause, as a sort
key in the ORDERBY clause, in an element in the set in the GIVING
clause, or as a value in the INSERT and UPDATE command.
Note that the expression result must be
a boolean scalar if used in the WHERE clause. If used in ORDERBY
the result can also be a numeric or string scalar. If used in
GIVING it can also be a DateTime scalar. If used as a new value in
the UPDATE or INSERT command it can also be an array.
The expression in the clause can be as complex as one likes
using the standard
arithmetic, comparison, and logical operators.
Parentheses can be used to group subexpressions.
The operands in an expression can be
table columns,
table keywords,
constants,
functions,
sets and intervals, and
subqueries.
The index operator can be used to take a
single element or a subsection from an array expression.
E.g.
column1 > 10
column1 + arraycolumn[index] >= min (column2, column3)
column1 IN [expr1 =:< expr2]
The last example shows a set with a continuous interval.
Data Types
Internally TaQL uses the following data types:
- Bool
- Double
- which includes integers and times/positions
- Complex
- which includes single and double precision complex
- String
- Operator + can be used (concatenation).
- Regex
- which is formed by the functions regex,
pattern, and sqlpattern or by a special constant
(see discussion below).
- DateTime
- which represents a date/time. There are several functions
acting on a date/time. Also operator + and - can be used.
Scalars and arbitrarily shaped arrays of these data types can be used.
However, arrays of Regex are not possible.
If an operand or function argument with a non-matching data type
is used, TaQL can do the following automatic conversions:
- from Double to Complex
- from String to DateTime
Externally TaQL supports any possible data type of a table column or
keyword. In some commands (column list and
CREATE TABLE) it is possible to specify
the data type of a column. There are several synonyms.
The possible types and values are:
B BOOL BOOLEAN
U1 UCHAR BYTE
I2 SHORT SMALLINT
U2 UI2 USHORT USMALLINT
I4 INT INTEGER
U4 UI4 UINT UINTEGER
R4 FLT FLOAT
R8 DBL DOUBLE
C4 FC FCOMPLEX COMPLEX
C8 DC DCOMPLEX
S STRING
Constants
Scalar constants of the various data types can be formed as follows
(which is very similar to Glish):
Constant N-dimensional arrays cannot be formed directly.
It is possible though to form a constant bounded discrete
set. If needed such a set is automatically
transformed to a 1-dimensional array. E.g.
[1:10]
is equal to a 1D array of 10 elements with values 1..10.
Furthermore it is possible to use the array function to
transform an array to another shape. This function is very similar to
the same function in glish. E.g.
array([1:10],10,4)
creates an array with shape [10,4]. Each row contains the values
1..10.
Please note that in TaQL arrays are stored in Fortran order, thus
the first index varies most rapidly (which is opposite to C).
Similar to glish the value array is wrapped if the created
array has more elements. Unlike glish the shape does not need to be
defined by scalars, but it can also be defined by another set. Thus:
array(F,shape(DATA))
is a boolean array filled with F with the same shape as the DATA array.
Operators
The following operators can be used (with their normal meaning and
precedence) for scalars:
- The precedence order is:
^
unary +, -, !
*, /, %
+, -
==, !=, >, >=, <, <=, IN, INCONE, BETWEEN, EXISTS, LIKE,
~
, !~
&&
||
All these operators can be used in the same way for arrays
(also a mix of scalar and array).
Functions
Some functions have 2 names. One name is the AIPS++/Glish name, while the
other is the name as used in SQL.
In the following tables the function names are shown in uppercase,
while the result and argument types are shown in lowercase.
Note, however, that function names are case-insensitive.
Sets, and in particular
subqueries, can result in a 1-dim array.
This means that the functions accepting an array argument can also
be used on a set or the result of a subquery. E.g.
WHERE datecol IN date([11-Nov-97,16-Dec-97,14-Jan-98])
These functions can be used on a scalar or an array argument.
-
- double STRLENGTH(string), double LEN(string)
Returns the number of characters in a string
(trailing whitespace is significant).
-
- string UPCASE(string), string UPPER(string)
Convert to uppercase.
-
- string DOWNCASE(string), string LOWER(string)
Convert to lowercase.
-
- string TRIM(string)
Removes trailing whitespace.
The syntax for forming a regex and pattern are explained in class
Regex.
These functions can only be used on a scalar argument.
-
- regex REGEX(string)
Handle the given string as a regular expression.
-
- regex PATTERN(string)
Handle the given string as a UNIX filename-like pattern and
convert it to a regular expression.
-
- regex SQLPATTERN(string)
Handle the given string as an SQL-style pattern and
convert it to a regular expression.
An SQL-style pattern has two special character:
% for zero or more arbitrary characters
_
for one arbitrary character
A regex formed this way can only be used in a comparison == or !=. E.g.
upcase(object) == pattern('3C*')
to find all 3C objects in a catalogue.
See also the discussion on constants for
other ways of forming and testing regular expressions.
These functions can be used on a scalar or an array argument.
-
- DateTime DATETIME(string), DateTime CTOD(string)
Parse the string and convert it to a date/time. The syntax of
date/time is explained in class
MVTime.
-
- DateTime MJDTODATE(double)
The double value, which has to be a MJD (ModifiedJulianDate), is
converted to a DateTime.
-
- DateTime DATE(DateTime)
Get the date (i.e., remove the time part). This function is needed in
DATE(column) == 12Feb1997
if the column contains date/times with times>0.
-
- double MJD(DateTime)
Get the DateTime as a MJD (ModifiedJulianDate).
-
- double YEAR(DateTime)
Get the year (which includes the century).
-
- double MONTH(DateTime)
Get the month number (1-12).
-
- double DAY(DateTime)
Get the day number (1-31).
-
- double WEEKDAY(DateTime), double DOW(DateTime)
Get the weekday number (1=Monday, ..., 7=Sunday).
-
- string CMONTH(DateTime)
Get the name of the month (Jan ... Dec).
-
- string CWEEKDAY(DateTime), string CDOW(DateTime)
Get the name of the weekday (Mon .. Sun).
-
- double WEEK(DateTime)
Get the week number in the year (1 ... 53).
Note that week 1 is the week containing Jan 4th.
-
- double TIME(DateTime)
Get the time part of the day. It is converted to radians to
be compatible with the internal representation of times/positions.
In that way the function can easily be used as in:
TIME(date) > 12h
All functions can be used without an argument in which case the current
date/time is used. E.g. DATE() results in the current date.
It is possible to give a string argument instead of a date. In this
case the string is parsed and converted to a date (i.e., in fact the
function DATETIME is used implicitly).
Two functions make it possible to compare 2 double or complex
values within a tolerance.
They can be used on scalar and array arguments (and a mix of them).
The tolerance must be a scalar though.
-
- bool NEAR(numeric val1, numeric val2, double tol)
Tests in a relative way if a value is near another. Relative
means that the
magnitude of the numbers is taken into account.
It returns
abs(val2 - val1)/max(abs(val1),abs(val2)) < tol.
If tol<=0, it returns val1==val2.
If either val is 0.0, it takes
care of area around the minimum number that can be represented.
The default tolerance is 1.0e-13.
-
- bool NEARABS(numeric val1, numeric val2, double tol)
Tests in an absolute way if a value is near another. Absolute
means that the
magnitude of the numbers is not taken into account.
It returns abs(val2 - val1) < tol.
The default tolerance is 1.0e-13.
-
- bool ISNAN(numeric val)
Tests if a numeric value is a NaN (not-a-number).
Several functions can operate on double or complex arguments.
The data types of such functions is given as 'numeric'.
They can be used on scalar and array arguments (and a mix of them).
-
- double PI()
Returns the value of pi.
-
- double E()
Returns the value of e (is equal to EXP(1)).
-
- numeric SIN(numeric)
-
- numeric SINH(numeric)
-
- double ASIN(double)
-
- numeric COS(numeric)
-
- numeric COSH(numeric)
-
- double ACOS(double)
-
- double TAN(double)
-
- double TANH(double)
-
- double ATAN(double)
-
- double ATAN2(double y, double x)
Returns ATAN(y/x) in correct quadrant.
-
- numeric EXP(numeric)
-
- numeric LOG(numeric)
Natural logarithm.
-
- numeric LOG10(numeric)
-
- numeric POW(numeric, numeric)
The same as operator ^
.
-
- numeric SQUARE(numeric), numeric SQR(numeric)
The same as ^
2, but much faster.
-
- numeric SQRT(numeric)
-
- complex COMPLEX(double, double)
-
- numeric CONJ(numeric)
-
- double REAL(numeric)
Real part of a complex number.
-
- double IMAG(numeric)
Imaginary part of a complex number.
-
- double NORM(numeric)
-
- double ABS(numeric), double AMPLITUDE(numeric)
-
- double ARG(numeric), double PHASE(numeric)
-
- numeric MIN(numeric, numeric)
-
- numeric MAX(numeric, numeric)
-
- double SIGN(double)
Returns -1 for a negative value, 0 for zero, 1 for a positive value.
-
- double ROUND(double)
Rounds the absolute value of the number.
E.g. ROUND(-1.6) = -2.
-
- double FLOOR(double)
Works towards negative infinity.
E.g. FLOOR(-1.2) = -2
-
- double CEIL(double)
Works towards positive infinity.
-
- double FMOD(double, double)
The same as operator %.
Note that the trigonometric functions need their arguments in radians.
The following functions reduce an array to a scalar.
They are meant for an array, but can also be used for a scalar.
-
- bool ANY(bool)
Is any element true?
-
- bool ALL(bool)
Are all elements true?
-
- double NTRUE(bool)
Number of true elements.
-
- double NFALSE(bool)
Number of false elements.
-
- numeric SUM(numeric)
Return sum of all elements.
-
- numeric SUMSQUARE(numeric), numeric SUMSQR(numeric)
Return sum of all squared elements.
-
- numeric PRODUCT(numeric)
Return product of all array elements.
-
- double MIN(double)
Return minimum of all array elements.
-
- double MAX(double)
Return maximum of all array elements.
-
- double MEAN(double), double AVG(double)
Return mean of all array elements.
-
- double VARIANCE(double)
Return variance (the sum of
(a(i) - mean(a))**2/(nelements(a) - 1).
-
- double STDDEV(double)
Return standard deviation (the square root of the variance).
-
- double AVDEV(double)
Return average deviation. (the sum of
abs(a(i) - mean(a))/nelements(a).
-
- double MEDIAN(double)
Return median (the middle element).
If the array has an even number of elements, the mean of
the two middle elements is returned.
-
- double FRACTILE(double, doublescalar fraction)
Return the value of the element at the given fraction.
Fraction 0.5 is the same as the median.
These functions reduce an array to a smaller array by collapsing
the given axes using the given function. The axes are the last argument(s).
They can be given in two ways:
- As a single set argument; for example, maxs(ARRAY,[1:2])
- As individual scalar arguments; for example, maxs(ARRAY,1,2)
For example, using
MINS(array,1,2) for a 3-dim array results in a 1-dim array
where each value is the minimum of each plane in the cube.
Axes numbers are 1-relative, thus 1 is the first axis.
Axes numbers exceeding the dimensionality of the array are ignored.
Thus, for example, maxs(ARRAY,[2:10]) works for arrays
of virtually any dimensionality and results in a 1-dim array.
The function names are the 'plural' forms of the
functions in the previous table.
They can only be used for arrays, thus not for scalars.
-
- bool ANYS(bool)
Is any element true?
-
- bool ALLS(bool)
Are all elements true?
-
- double NTRUES(bool)
Number of true elements.
-
- double NFALSES(bool)
Number of false elements.
-
- numeric SUMS(numeric)
Return sum of elements.
-
- numeric SUMSQUARES(numeric), numeric SUMSQRS(numeric)
Return sum of squared elements.
-
- numeric PRODUCTS(numeric)
Return product of elements.
-
- double MINS(double)
Return minimum of elements.
-
- double MAXS(double)
Return maximum of elements.
-
- double MEANS(double), double AVG(double)
Return mean of elements.
-
- double VARIANCES(double)
Return variance (the sum of
(a(i) - mean(a))**2/(nelements(a) - 1).
-
- double STDDEVS(double)
Return standard deviation (the square root of the variance).
-
- double AVDEVS(double)
Return average deviation. (the sum of
abs(a(i) - mean(a))/nelements(a).
-
- double MEDIANS(double)
Return median (the middle element).
If the array has an even number of elements, the mean of
the two middle elements is returned.
-
- double FRACTILES(double, doublescalar fraction)
Return the value of the element at the given fraction.
Fraction 0.5 is the same as the median.
-
- bool ISDEFINED(anytype)
Return false if the value in the current row is undefined. Is
useful to test if a cell in a column with variable shaped arrays
contains an array. It can also be used to test if a field in a
record is defined.
-
- double NELEMENTS(anytype), double COUNT(anytype)
Return number of elements in an array (1 for a scalar).
-
- double NDIM(anytype)
Return dimensionality of an array (0 for a scalar).
-
- doublearray SHAPE(anytype)
Return shape of an array (returns an empty array for a scalar).
-
- double ROWNUMBER()
Return the row number being tested (first row is row number 1).
In combination with function RAND it can, for instance,
be used to select arbitrary rows from a table.
-
- double ROWID()
Return the row number in the original table. This is especially
useful for returning the result of a selection of a subtable
of an AIPS++ measurement set
(see also sections Subqueries
and Examples).
-
- double RAND()
Return (per table row) a uniformly distributed random number
between 0 and 1 using a Multiplicative Linear Congruential Generator.
The seeds for the generator are deduced from the current date and
time, so the results are different from run to run.
The function can, for instance, be used to select a random
subset from a table.
-
- anytype IIF(cond,arg1,arg2)
This is a special funtion which operates like the ternary ?:
operator in C++.
If all arguments are scalars, the result is a scalar, otherwise
an array. In the latter case possible scalar arguments are
virtually expanded to arrays.
IIF evaluates the condition for each element. If True, it takes
the corresponding element of arg1, otherwise of arg2.
-
- anytypearray ARRAY(anytype,shape)
This function creates an array of the given type and shape.
The shape is given in the last argument(s).
It can be given in two ways:
- As a single set argument; for example, array(0,[3,4])
- As individual scalar arguments; for example, array(0,3,4)
The first argument gives the values the array is filled with.
It can be a scalar or an array of any shape. To initialize the
created array, the value array is flattened to a 1D array.
Its successive values are stored in the created array. If the
new array has more values than the value array, the value array is
reset to its beginning and the process continues.
Cone search functions
Cone search functions make it possible to test if a source is
within a given distance of a given sky position. The expression
cos(0d1m) < sin(52deg) * sin(DEC) +
cos(52deg) * cos(DEC) * cos(3h30m - RA)
could be used to test if sources with their sky positions defined
in columns RA and DEC are within 1 arcmin of the
given sky position.
The cone search functions implement this expression making life much
easier for the user. Because they
can also operate on arrays of positions, searching in
multiple cones can be done simultaneously. That makes it
possible to find matching source positions in two catalogues as shown
in an example at the end of this section.
The arguments in all functions are described below. All of them have
to be given in radians. However, usually one does not need to bother
because TaQL makes it possible
to specify positions in many formats which are automatically converted
to radians.
-
- SOURCES
is a set or array giving the positions of one or more
sources (e.g. in equatorial coordinates)
to be tested. Normally these are columns in a table.
If argument name SOURCE is mentioned, only a single source
can be used.
For example:
[RA,DEC] for scalar columns RA and DEC.
SKYPOS for a column SKYPOS containing 2-element
vectors with RA and DEC.
-
- CONES
is a set or array giving the center positions and radii of
one or more cones (e.g. as RA,DEC,radius).
Usually this will be given by the user as constants.
For example:
[12:13:54, -5.3.34, 0d1m] for a single cone.
[12:13:54, -5.3.34, 0d1m, 1:2:3, 4.5.6, 0d1m] for two cones.
-
- CONEPOS
is a set or array giving the positions of one or more
cone centers (e.g. as RA,DEC).
-
- RADII
is a scalar, set or array giving one or more radii.
Each radius is applied to all positions in CONEPOS.
Specifying a cone as CONEPOS,RADIUS is easier than specifying
it as CONES if the same radius has to be used for
multiple cones.
For example:
[12:13:54, -5.3.34, 1:2:3, 4.5.6], 0d1m is the same
as the second CONES example above.
The following cone search functions are available.
-
- bool ANYCONE(SOURCE,CONES)
Return T if the source is contained in one of the cones.
Operator INCONE is a synonym. So
ANYCONE(SOURCE,CONES) is the same as
SOURCE INCONE CONES.
-
- bool ANYCONE(SOURCE,CONEPOS,RADII)
It does the same as above.
-
- double FINDCONE(SOURCES,CONES)
Return the 1-based index of the first cone containing the source.
If a single source is given, the result is a scalar.
If multiple sources are given, the result is an array with the
same shape as the source array.
-
- double FINDCONE(SOURCES,CONEPOS,RADII)
It does the same as above. Note that in this case each radius is
applied to each cone, so the index is a combination of the two
(with the radius as the most rapidly varying axis).
-
- bool CONES(SOURCES,CONES)
Return a 2-dim bool array. The length of the first axis is the
number of cones. The length of the second axis is number of
sources. An element (i,j) in the array is T
if source j is contained in cone i.
-
- bool CONES(SOURCES,CONEPOS,RADII)
It does the same as above. However, the result is a 3-dim array
with the radii as the first axis, cones as second and sources
as third.
Please note that ANYCONE(SOURCE,CONES) does the same as
any(CONES(SOURCE,CONES)), but faster.
Function CONES makes it possible to do catalogue matching.
For example, to find sources matching other sources in the same
catalogue (within a radius of 10 arcseconds):
CALC FROM table.cat CALC CONES([RA,DEC],
[SELECT FROM table.cat GIVING [RA,DEC]], 0d0m10)
Note that in this example the SELECT clause returns an array with positions
which are used as the cone centers. So each source in the catalogue is
tested against every source. It makes it an N-square operation, thus
potentially very expensive.
The result is a 4-dim boolean array with shape
[1,nrow,1,nrow] which can be processed in Glish. Please note
that the CONES function results for
each row in a array with shape [1,nrow,1].
The query could be done with multiple radii, for example also
with 1 arcsecond and 1 arcminute.
CALC FROM table.cat CALC CONES([RA,DEC],
[SELECT FROM table.cat GIVING [RA,DEC]], [0d0m1, 0d0m10, 0d1m])
resulting in an array with shape [3,nrow,1,nrow].
It would give a better indication how close sources are.
Table Columns
A column can contain a scalar or an array value.
Note that only columns in the primary table can be handled directly.
A column in another table can be used via a subquery. E.g.
SELECT FROM tab WHERE col >
mean([SELECT othercol FROM othertab])
An expression has to contain at least one column, since columns
are the only variable part in it. That is, a row can only be selected
or sorted by means of the column values in each row.
The name of a column can contain alphanumeric characters and underscores.
It should start with an alphabetic character or underscore.
A column name is case-sensitive.
It is possible to use other characters in the name by
escaping them with a backslash. E.g. DATE\
-OBS.
In the same way a numeric character can be used as the first
character of the column name. E.g. \
1stDay.
Because several words are used in the language, they cannot
be used directly as column names. The reserved words are:
ALL AND AS ASC BETWEEN CREATETABLE DELETE DESC DISTINCT
EXISTS EXCEPT FROM GIVING GROUPBY HAVING
IN INCONE INSERT INTERSECT INTO JOIN LIKE LIMIT
MINUS NODUPLICATES NOT OFFSET ON OR ORDERBY
SAVETO SELECT SET UNION UNIQUE UPDATE VALUES WHERE
They can, however, be used as a column name by escaping
them with a backslash. E.g. \
IN.
Note that in C++ and Glish a backslash itself has to be escaped
by another backslash. E.g. in Glish:
tab.query('DATE\\
-OBS>10MAR1996').
If a column contains a record, one has to specify a field in it
using the dot operator; e.g. col.fld means use field
fld in the column. It is fully recursive, so
col.fld.subfld can be used if field fld is a record
in itself.
Alas records in columns are not really supported yet. One can specify
fields, but thereafter an error message will be given.
Table Keywords
It is possible to use table or column keywords, which can have
a scalar or an array value. A table keyword has to be specified
as ::key. In an expression the :: part can be omitted
if there is no column with such a name.
A column keyword has to be specified as column::key.
Note that the :: syntax is chosen, because it is similar
to the scope operator in C++.
As explained in the FROM clause of the syntax
section, keywords from the primary table and from secondary tables
can be used. If used from a secondary table, it has to be qualified
with the (shorthand) name of the table. E.g.
sh.key or sh.::key
takes table keyword key from the table with the shorthand name
sh.
If a keyword value is a record, it is possible to use
a field in it using the dot operator. E.g. ::key.fld
to use field fld. It is fully recursive, so if the
field is a record in itself, a subfield can be used like
col::key.fld.subfld
A keyword can be used in any expression. It is evaluated immediately
and transformed to a constant value.
Array Index Operator
It is possible to take a subsection or a
single element from an array column, keyword or expression
using the index operator
[index1,index2,...]. The rules for this
are similar to those used in Glish.
Taking a single element can be done as:
array[1]
array[1, some_expression]
Taking a subsection can be done as:
array[start1:end1:incr1, start2:end2:incr2, ...]
If a start value is left out it defaults to the beginning of
that axis. An end value defaults to the end of the axis and
an increment defaults to one. If an entire axis is left out,
it defaults to the entire axis.
E.g. an array with shape [10,15,20] can be subsectioned as:
[,,3] resulting in an array of shape [10,15,1]
[2:4, ::3, 2:15:2] resulting in an array of shape[3,5,7]
The examples show that an index can be a simple constant (as it will
usually be). It can also be an expression which can be as complex
as one likes. The expression has to result in a positive double value
which will be truncated to an integer length.
Note that as in Glish, array indices start at 1.
For fixed shaped arrays checking if array bounds are exceeded
is done at parse time.
For variable shaped arrays
it can only be done per row. If array bounds are exceeded,
an exception is thrown. In the future a special undefined value
will be assigned if bounds of variable shaped arrays are exceeded
to prevent the selection process from aborting due to the exception.
Note that the index operator will be applied directly
to a column. This results in reading only the required part of the
array from the table column on disk.
It is, however, also possible to apply it to a
subexpression (enclosed in parentheses) resulting in an array.
E.g.
arraycolumn[2,3,4] + 1
(arraycolumn + 1)[2,3,4]
can both be used and have the same result. However, the first
form is much faster, because only a single element is read
(resulting in a scalar) and 1 is added to it.
The second form results in reading the entire array.
1 is added to all elements and only then the requested element is taken.
From this example it should be clear that indexing an array
expression has to be done with care.
Sets and intervals
As in SQL the operator IN can be used to do a selection
based on an array or a set. E.g.
SELECT FROM table WHERE column IN expr1
In this example expr1 is the array result of an expression. The result
of operator IN is true if the column value matches one of the
values in the array. It is also possible to use a scalar as the
righthand of operator IN.
So if expr1 is a scalar, operator IN
gives the same result as operator ==.
It is also possible to define a set explicitly. E.g.
SELECT FROM table WHERE column IN [expr1, expr2, expr3]
This example shows that (in its simplest form) a set
consists of one or more values (which
can be arbitrary expressions) separated by commas and enclosed in
square brackets. The elements in a set have to be scalars and their
data types have to be the same. The square brackets can be left out if
the set consists of only one element. For SQL compliance
parentheses can be used instead of square brackets.
An element in a set can, however, be more complicated and can
define multiple values or an interval. The possible forms of
a set element are:
- 1.
- A single value as shown in the example above.
- 2.
- start:end:incr. This is similar to the
way an array index is specified. Incr defaults to 1.
End defaults to an open end (i.e., no upper bound) and results
in an unbounded set. Start and end can be a double or a datetime.
Incr has to be a double. Some examples:
1:10 means 1,2,...,10
1:10:2 means 1,3,5,7,9
1::2 means all odd numbers
1: means all positive integer numbers
date('18Aug97')::2 means every other day from 18Aug97 on
These examples show constants only, but start, end,
and incr can be any expression.
Note that :: used here can conflict with the :: in the
keywords. E.g. a::b is scanned as
a keyword specification. If the intention is start::incr
it should be specified as a: :b. In practice this conflict
will hardly ever occur.
- 3.
- Continuous intervals can be specified for double, string, and datetime.
The specification of an interval resembles the mathematical notation
1<x<5, where x is replaced by :. An open interval side
is indicated by <, while a closed interval side is indicated
by =.
Another way to specify intervals is using curly and/or angle brackets.
A curly bracket is a closed side, the angle bracket is an open side.
The following examples show how bounded and half-bounded,
(half-)open and closed intervals can be specified.
1=:=5 {1,5} means 1<=x<=5 bounded closed
1<:<5 <1,5> means 1<x<5 bounded open
1=:<5 {1,5> means 1<=x<5 bounded right-open
1<:=5 <1,5} means 1<x<=5 bounded left-open
1=: {1,} {1,> means 1<=x left-bounded closed
1<: <1,} <1,> means 1<x left-bounded open
:=5 {,5} <,5} means x<=5 right-bounded closed
:<5 {,5> <,5> means x<5 right-bounded open
It is very important to note that the 2nd form results in
discrete values, while the 3rd form results in a continuous interval.
Each element in a set can have its own form, i.e., one element can
be a single value while another can be an interval.
If a set consists of single or bounded discrete
start:end:incr values only, the set will be expanded to an
array.
This makes it possible for array operators and functions
(like mean) to be applied to such sets. E.g.
WHERE column > mean([10,30:100:5])
Another form of constructing a set is using a
subquery as shown below.
Subqueries
As in SQL it is possible to create a set from a subquery. A
subquery has the same syntax as a main query, but has to be
enclosed in square brackets or parentheses. Basically it looks like:
SELECT FROM maintable WHERE time IN
[SELECT time FROM othertable WHERE windspeed < 5]
The subquery on othertable results in a constant set
containing the times
for which the windspeed matches. Subsequently the main query
is executed and selects all rows from the main table with times in
that set.
Note that like other bounded sets this set is transformed to a
constant array, so it is possible to apply functions to it (e.g. min, mean).
SELECT FROM maintable WHERE time IN
[SELECT time FROM othertable WHERE windspeed <
mean([SELECT windspeed FROM othertable])]
This contains another subquery to get all windspeeds and
to take the mean of them. So the first subquery selects all times
where the windspeed is less than the average windspeed.
A subquery result should contain only one column, otherwise
an exception is thrown.
It may happen that a subquery has to be executed twice because
2 columns from the other table are needed. E.g.
SELECT FROM maintable WHERE any(time >=
[SELECT starttime FROM othertable WHERE windspeed < 5]
&& time <=
[SELECT endtime FROM othertable WHERE windspeed < 5])
In this case the other table contains the time range for each windspeed.
For big tables it is expensive to execute the subquery twice.
A better solution
is to store the result of the subquery in a temporary table and reuse it.
SELECT FROM othertable WHERE windspeed < 5 GIVING tmptab
SELECT FROM maintable WHERE any(time >=
[SELECT starttime FROM tmptab]
&& time <=
[SELECT endtime FROM tmptab])
However, this has the disadvantage that the table tmptab
still exists after the query and has to be deleted explictly by the
user. Below a better solution for this problem is shown.
TaQL has a few extensions to support tables better,
in particular the AIPS++ measurement sets.
- 1.
- The temporary problem above can be circumvented by using the
ability to use a SELECT expression in the FROM
clause. E.g.
SELECT FROM maintable,
[SELECT FROM othertable WHERE windspeed < 5] tmptab
WHERE any(time >= [SELECT starttime FROM tmptab]
&& time <= [SELECT endtime FROM tmptab])
However, below a even nicer solution is given.
- 2.
- The time range problem above can be solved elegantly by using
a set as the result of the subquery. Instead of a table name,
it is possible to give an expression in the GIVING clause (as mentioned
in the syntax section). E.g.
select from MY.MS where TIME in
[select FROM OTHERTABLE where WINDSPEED < 5
giving [TIME-INTERVAL/2 =:= TIME+INTERVAL/2]]
The set expression in the GIVING clause is filled with the
results from the subquery and used in the main query. So if
the subquery results in 5 rows, the resulting set contains 5
intervals. Thereafter the resulting intervals are sorted and combined
where possible. In this way the minimum number of intervals have to be
examined by the main query.
- 3.
- In AIPS++ the other table will often be the name of a subtable,
which is stored in a table or column keyword of the main table.
The standard keyword syntax can be used
to indicate that the other table is the table in the given keyword.
Note that for a table keyword the :: part has to be given,
otherwise the name is treated as an ordinary table name. E.g.
select from MY.MS where TIME in
[select TIME from ::WEATHER where WINDSPEED < 5]
In this example the other table is a subtable of table my.ms.
Its name is stored in keyword WEATHER of my.ms.
- 4.
- Often the result of a query on a subtable of a measurement set is
used to select columns from the main table. However, several
subtables do not have an explicit key, but use the row number as
an implicit key. The function ROWID() can be used to
return the row number as the subtable query result. E.g.
select from MY.MS where DATA_DESC_ID in
[select from ::DATA_DESCRIPTION where
SPECTRAL_WINDOW_ID in [0,2,4] giving [ROWID()]]
Note that the function ROWNUMBER cannot be used here,
because it will give the row number in the selection and not
(as ROWID does) the row number in the original table.
Furthermore, ROWID gives a 0-relative row number which is
needed to be able to use it as a selection criterium on the 0-relative
values in the measurement set.
Next: Some further remarks
Up: NOTE 199 - Table Query Language
Previous: Selection from a table
  Contents
Please send questions or comments about AIPS++ to aips2-request@nrao.edu.
Copyright © 1995-2000 Associated Universities Inc.,
Washington, D.C.
Return to AIPS++ Home Page
2006-10-15