Version 1.9 Build 1556

News	FAQ
Search	Home

Next: Expressions Up: NOTE 199 - Table Query Language Previous: TaQL Commands

Subsections

Selection from a table

The SELECT is the main TaQL command. It can be used to select a subset of rows and/or columns from a table or to generate new columns based on expressions.

As explained above, the result of a selection is usually a reference table. This table can be used as any other table, thus it is possible to do another selection on it or to update it (which updates the underlying original table). It is, however, not possible to insert rows in a reference table or to delete rows from it.

If the select column list contains expressions, it is not possible to generate a reference table. Instead a normal plain table is generated (which can take some time if it contains large data arrays). It should be clear that updating such a table does not update the original table.

The various parts of the SELECT command are explained in the following sections.
Although the clauses column_list, WHERE, ORDERBY, LIMIT, and OFFSET are optional, at least one of them has to be used. Otherwise no operation is performed on the primary table (which makes no sense). Note that the GIVING clause with a value set is seen as an operation as well.

SELECT column_list

Columns to be selected can be given as a comma-separated list with names of columns that have to be selected from the primary table in the table_list (see below). If * or no column_list is given, all columns will be selected. It results in a so-called reference table. Optionally a selected column can be given another name in the reference table using AS name (where AS is optional). For example:

  select TIME,ANTENNA1,ANTENNA2,DATA from 3C343.MS
  select TIME,ANTENNA1,ANTENNA2,MODEL_DATA AS DATA from 3C343.MS

It is possible to change the data type of a column by specifying a data type (see below) after the new column name. Giving a data type (even if the same as the existing one) counts as an expression, thus results in the generation of a plain table. For example:

  select MODEL_DATA AS DATA FCOMPLEX from 3C343.MS

It is possible to use expressions in the column list to create new columns based on the contents of other columns. When doing this, the resulting table is a normal table (because a reference table cannot contain expressions). The new column can be given a name by giving AS name after the expression (where AS is optional). If no name is given, a unique name like Col_1 is constructed. After the name a data type string can be given for the new column. If no data type is given, the expression data type is used.

  select max(ANTENNA1,ANTENNA2) AS ANTENNA INT from 3C343
  select means(DATA,1) from 3C343

Note that for subqueries the GIVING clause offers a better (faster) way of specifying a result expression. It also makes it possible to use intervals.

If a column_list is given and if all columns are scalars, the column_list can be preceeded by the word DISTINCT. It means that the result is made unique by removing the rows with duplicate values in the columns of the column_list. Instead of DISTINCT the synonym NODUPLICATES or UNIQUE can also be used. To find duplicate values, some temporary sorting is done, but the original order of the remaining rows is not changed.
Note that support of this keyword is mainly done for SQL compliance. The same (and more) can be achieved with the DISTINCT keyword in the ORDERBY clause with the difference that ORDERBY DISTINCT will change the order.
For full SQL compliance it is also possible to give the keyword ALL which is the opposite of DISTINCT, thus all values are returned. This is the default. Because there is an ambiguity between the keyword ALL and function ALL, the first element of the column list cannot be an expression if the keyword ALL is used.

INTO table [AS type]

This indicates that the ultimate result of the SELECT command should be written to a table (with the given name). This table can be a reference table, a plain table, or a memory table.

The type argument is optional and can be one of several values:

MEMORY: to store the result in a memory table.
PLAIN: to store the result in a plain table. Its endian format is determined by the aipsrc definition.
PLAIN_BIG: to store the result in a plain table in big-endian format.
PLAIN_LITTLE: to store the result in a plain table in little-endian format.
PLAIN_LOCAL: to store the result in a plain table in native endian format.

If type is not given, the result will be written in a reference table. However, if expressions are given in the column list of a projection, the result is written in a plain table.
The standard TaQL way to define the output table is the GIVING clause. INTO is available for SQL compliance.

FROM table_list

The FROM part defines which tables are used in the selection. It is a comma-separated list of table names which can contain path specification and environment variables or the UNIX ~ notation. The first table in the list is the primary table and is used for all columns in the other clauses. Usually only one table is used in which case the list consists of only one table name. E.g.

SELECT col1,col2 FROM mytable
    WHERE col1>col2
    ORDERBY col1

In this example columns col1 and col2 are taken from mytable.

However, it is possible to specify more tables in the table_list. In the WHERE clause these secondary tables can be used to take keywords from. E.g.

SELECT FROM mytable,othertable
    WHERE col1>othertable.key

As shown in the example above a qualifying name (othertable.) can be used in the WHERE clause to specify from which table a keyword has to be taken. If no qualifying name is given, the keyword (or column) is taken from the primary table (i.e., the first table in the table_list). This means that qualifying names are only needed in special cases. The qualifying name can not contain special characters like a slash. Therefore a table_name needs an explicit shorthand alias if it contains special characters.

The full table_list syntax is:
table_name1 [shorthand1], table_name2 [shorthand2], etc.
The shorthand defaults to the table_name. A shorthand is needed if the table name contains non-alphanumeric characters. In the following example shorthand my is not really needed. Shorthand other is needed though.

SELECT FROM mytable my, ~user/othertable other
    WHERE my.col1>other.key

Similar to SQL and OQL the shorthand can also be given using AS or IN. E.g.

SELECT FROM mytable AS my, other IN ~user/othertable

Note that if using IN, the shorthand has to preceed the table name. It can be seen as an iterator variable.

There are three special ways to specify a table:

1.

A table name can be taken from a keyword in a previously specified table. This can be useful in a subquery. The syntax for this is the same as that for specifying keywords in an expression. E.g.

SELECT FROM mytable tab
    WHERE col1 IN [SELECT subcol FROM tab.col2::key]

In this example key is a table keyword of column col2 in table mytable (note that tab is the shorthand for mytable and could be left out).
It can also be used for another table in the main query. E.g.

SELECT FROM mytable, ::key subtab
    WHERE col1 > subtab.key1

In this example the keyword key1 is taken from the subtable given by the table keyword key in the main table.
If a keyword is used as the table name, the keyword is searched in one of the tables previously given. The search starts at the current query level and proceeds outwards (i.e., up to the main query level). If a shorthand is given, only tables with that shorthand are taken into account. If no shorthand is given, only primary tables are taken into account.

2.

Like in OQL it is possible to use a nested query command in the FROM clause. This is a normal query command enclosed in square brackets or parentheses. It results in a temporary table which can thereafter be used as a table in the rest of the query command. A shorthand has to be defined for it in order to be able to refer to that table.
Use of a query in the FROM command can be useful to avoid multiple equal subqueries. E.g.

  select from MS,
   [select from MS where sumsqr(UVW[1:2]) < 625]
    as TIMESEL
   where TIME in [select distinct TIME from TIMESEL]
    &&  any([ANTENNA1,ANTENNA2] in
      [select from TIMESEL giving
[iif(UVW[3] < 0, ANTENNA1, ANTENNA2)]])

is a command to find shadowed antennas for the VLA. Without the query in the FROM command the subqueries in the remainder of the command would have been more complex. Furthermore, it would have been necessary to execute that select twice.
The command is quite complex and cannot be fully understood before reading the rest of this note. Note, however, that the command uses the shorthand TIMESEL to be able to use the temporary table in the subqueries.

3.

Normally only persistent tables (i.e. tables on disk) can be used. However, it is also possible to use transient tables in a TaQL command given in Glish or C++. This is done by passing one or more table objects to the function executing the TaQL command. In the TaQL command a $-sign followed by a sequence number has to be given to indicate the correct object containing the transient table. E.g. if two table objects are passed $1 indicates the first table, while $2 indicates the second one.

WHERE expression

It defines the selection expression which must have a boolean scalar result. A row in the primary table is selected if the expression is true for the values in that row. The syntax of the expression is explained in a later section.

ORDERBY sort_list

It defines the order in which the result of the selection has to be sorted. The sort_list is a comma separated list of expressions.
The sort_list can be preceeded by the word ASC or DESC indicating if the given expressions are by default sorted in ascending or descending order (default is ASC). Each expression in the sort_list can optionally be followed by ASC or DESC to override the default order for that particular sort key.
To be compliant with SQL whitespace can be used between the words ORDER and BY.

The word ORDERBY can optionally be followed by DISTINCT which means that only the first row of multiple rows with equal sort keys is kept in the result. To be compliant with SQL dialects the word UNIQUE or NODUPLICATES can be used instead of DISTINCT.

An expression can be a scalar column or a single element from an array column. In these cases some optimization is performed by reading the entire column directly.
It can also be an arbitrarily complex expression with exactly the same syntax rules as the expressions in the WHERE clause. The resulting data type of the expression must be a standard scalar one, thus it cannot be a Regex or DateTime (see below for a discussion of the available data types). E.g.

ORDERBY col1, col2, col3
ORDERBY DESC col1, col2 ASC, col3
ORDERBY NODUPLICATES uvw[1] DESC
ORDERBY square(uvw[1]) + square(uvw[2])
ORDERBY datetime(col)      incorrect data type
ORDERBY mjd(datetime(col)) is correct

LIMIT expression

It indicates that up to N rows have to be selected, where N is the result of the expression.
LIMIT is applied after ORDERBY, so it is particularly useful in combination with ORDERBY to select, for example, the highest 10 values.

OFFSET expression

It indicates that the first N rows of the selection have to be skipped. As LIMIT it is particularly useful in combination with ORDERBY.

GIVING table|set

It indicates that the ultimate result of the SELECT command should be written to a table (with the given name). One can also give an option using AS option.
Another (more SQL compliant) way to define the output table is the INTO clause. See INTO for a more detailed description including the possible options.

It is also possible to specify a set in the GIVING clause instead of a table name. This is very useful if the result of a subquery is used in the main query. Such a set can contain multiple elements Each element can be a single value, range and/or interval as long as all elements have the same data type. The parts of each element have to be expressions resulting in a scalar.

In the main query and in a query in the FROM command the GIVING clause can only result in a table and not in a set.
To be compliant with SQL dialects, the word SAVETO can be used instead of GIVING. Whitespace can be given between SAVE and TO.

Next: Expressions Up: NOTE 199 - Table Query Language Previous: TaQL Commands Contents