casa  5.7.0-16
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros Groups Pages
Public Types | Public Member Functions | Private Member Functions | Private Attributes | List of all members
casacore::Sort Class Reference

Sort on one or more keys, ascending and/or descending. More...

#include <Sort.h>

Public Types

enum  Option {
  DefaultSort,
  HeapSort,
  InsSort,
  QuickSort,
  ParSort,
  NoDuplicates
}
 Enumerate the sort options: More...
 
enum  Order {
  Ascending,
  Descending
}
 Enumerate the sort order: More...
 

Public Member Functions

 Sort ()
 The default constructor can be used when the data is only passed in via function sortKey. More...
 
 Sort (const void *data, uInt elementSize)
 Construct a Sort object for the given data array with elements of elementSize bytes. More...
 
 Sort (const Sort &)
 Copy constructor (copy semantics). More...
 
 ~Sort ()
 
Sortoperator= (const Sort &)
 Assignment (copy semantics). More...
 
void sortKey (const void *data, DataType, uInt increment=0, Order=Ascending)
 Define a sort key (the most significant key should be defined first). More...
 
void sortKey (const void *data, const CountedPtr< BaseCompare > &, uInt increment, Order=Ascending)
 
void sortKey (uInt offset, DataType, Order=Ascending)
 
void sortKey (uInt offset, const CountedPtr< BaseCompare > &, Order=Ascending)
 
uInt sort (Vector< uInt > &indexVector, uInt nrrec, int options=DefaultSort, Bool tryGenSort=True) const
 Sort the data array of nrrec records. More...
 
uInt unique (Vector< uInt > &uniqueVector, uInt nrrec) const
 Get all unique records in a sorted array. More...
 
uInt unique (Vector< uInt > &uniqueVector, const Vector< uInt > &indexVector) const
 

Private Member Functions

void copy (const Sort &that)
 Copy that Sort object to this. More...
 
void addKey (const void *data, DataType, uInt nr, int options)
 Add a sort key giving a data type or the sort key. More...
 
void addKey (SortKey *)
 
uInt insSort (uInt nr, uInt *indices) const
 Do an insertion sort, optionally skipping duplicates. More...
 
uInt insSortNoDup (uInt nr, uInt *indices) const
 
uInt parSort (int nthr, uInt nrrec, uInt *inx) const
 Do a merge sort, if possible in parallel using OpenMP. More...
 
void merge (uInt *inx, uInt *tmp, uInt size, uInt *index, uInt nparts) const
 
uInt quickSort (uInt nr, uInt *indices) const
 Do a quicksort, optionally skipping duplicates (qkSort is the actual quicksort function). More...
 
uInt quickSortNoDup (uInt nr, uInt *indices) const
 
void qkSort (Int nr, uInt *indices) const
 
uInt heapSort (uInt nr, uInt *indices) const
 Do a heapsort, optionally skipping duplicates. More...
 
uInt heapSortNoDup (uInt nr, uInt *indices) const
 
void siftDown (Int low, Int up, uInt *indices) const
 Siftdown algorithm for heapsort. More...
 
int compare (uInt index1, uInt index2) const
 Compare the keys of 2 records. More...
 
void swap (Int index1, Int index2, uInt *indices) const
 Swap 2 indices. More...
 

Private Attributes

PtrBlock< SortKey * > keys_p
 
uInt nrkey_p
 
const void * data_p
 
uInt size_p
 
int order_p
 

Detailed Description

Sort on one or more keys, ascending and/or descending.

Intended use:

Public interface

Review Status

Reviewed By:
Friso Olnon
Date Reviewed:
1995/03/01
Test programs:
tSort, tSort_1

Synopsis

Sort lets you sort data on one or more keys in a mix of Sort::ascending and Sort::descending order. Duplicates can be skipped by giving the option Sort::NoDuplicates. Only in this case the number of output elements can be different from the number of input elements.
The unique function offers another way of getting unique values.

Class Sort does not sort the data themselves, but returns an index to them. This gives more flexibility and allows the sort to be stable; but it is slower.
Very fast sorting of the data themselves can be done with the functions in class GenSort. If sorting on a single key with a standard data type is done, Sort will use GenSortIndirect to speed up the sort.
Four sort algorithms are provided:

Sort::ParSort
The parallel merge sort is the fastest if it can use multiple threads. For a single thread it has O(n*log(n)) behaviour, but is slower than quicksort. A drawback is that it needs an extra index array to do the merge.
Sort::InsSort
Insertion sort has O(n*n) behaviour, thus is very slow for large arrays. However, it is the fastest method for small arrays (< 50 elements) and for arrays already (almost) in the right order.
Sort::QuickSort
Care has been taken to solve the well-known quicksort problems like "array already in order" or "many equal elements". The behaviour is O(n*log(n)) in all the cases tested, even in degenerated cases where the SUN Solaris qsort algorithm is O(n*n).
Sort::HeapSort
Heapsort has O(n*log(n)) behaviour. Its speed is lower than that of QuickSort, so QuickSort is the default algorithm.

The default is to use QuickSort for small arrays or if only a single thread can be used. Otherwise ParSort is the default.

All sort algorithms are stable, which means that the original order is kept when keys are equal.

The sort is a four step process:

  1. Construct the Sort object.
  2. Define the sort keys. The function sortKey must be called for each sort key (the most significant one first). The comparison object can be passed in directly, or a basic data type can be given. In the latter case the appropriate ObjCompare comparison object will be created.
  3. Sort the data. The function sort returns an index array, which is allocated when needed.
  4. Destruct the Sort object (usually done automatically) and delete the index array.

The data can be in a single array of structs, in separate arrays, or in a mix of those. Of course, all arrays must have the same length. The data can be passed to the Sort constructor and/or to the sortKey function. If passed to the Sort constructor, the offset of the data item in the data array must be given to sortKey.

Example

In the first example we sort the data contained in two "parallel" arrays, idata and ddata, both of length nrdata.

sort.sortKey (idata, TpInt); // define 1st sort key
sort.sortKey (ddata, TpDouble,0,Sort::Descending); // define 2nd sort key
Vector<uInt> inx;
sort.sort (inx, nrdata);
for (uInt i=0; i<nrdata; i++) { // show sorted data
cout << idata[inx[i]] << " " << ddata[inx[i]] << endl;
}

Now nr contains the nr of records (=nrdata) and inx an array of (sorted) indices.

In the second example we sort the data stored in an array of structs on the double (ascending) and the string (descending). We can pass the data to the Sort constructor, and the offsets of the struct elements to the sortKey function.

struct Ts {
String as;
double ad;
}
Vector<uInt> inx;
Sort sort (tsarr, sizeof(Ts));
sort.sortKey ((char*)&tsarr[0].ad - (char*)tsarr, TpDouble);
sort.sortKey ((char*)&tsarr[0].as - (char*)tsarr, TpString,
sort.sort (inx, nrts);

Note that the first argument in function sortKey gives the offset of the variable in the struct.

Alternatively, and probably slightly easier, we could pass the data to the sortKey function and use an increment:

struct Ts {
String as;
double ad;
}
Vector<uInt> inx;
sort.sortKey (&tsarr[0].ad, TpDouble, sizeof(Ts));
sort.sortKey (&tsarr[0].as, TpString, sizeof(Ts), Sort::Descending);
sort.sort (inx, nrts);

Finally, we could provide a comparison object for the struct.

struct Ts {
String as;
double ad;
}
class MyCompare: public BaseCompare {
virtual int comp (const void* val1, const void* val2) const
{
const Ts& t1 = *(Ts*)val1;
const Ts& t2 = *(Ts*)val2;
if (t1.ad < t2.ad) return -1;
if (t1.ad > t2.ad) return 1;
if (t1.as < t2.as) return 1; // string must be descending
if (t1.as > t2.as) return -1;
return 0;
}
};
Vector<uInt> inx;
sort.sortKey (tsarr, compareTs, sizeof(Ts));
sort.sort (inx, nrts);

Definition at line 248 of file Sort.h.

Member Enumeration Documentation

Enumerate the sort options:

Enumerator
DefaultSort 
HeapSort 
InsSort 
QuickSort 
ParSort 
NoDuplicates 

Definition at line 252 of file Sort.h.

Enumerate the sort order:

Enumerator
Ascending 
Descending 

Definition at line 260 of file Sort.h.

Constructor & Destructor Documentation

casacore::Sort::Sort ( )

The default constructor can be used when the data is only passed in via function sortKey.

casacore::Sort::Sort ( const void *  data,
uInt  elementSize 
)

Construct a Sort object for the given data array with elements of elementSize bytes.

This data array will be used when an offset is given to the sortKey functions. You can still pass additional data arrays to the sortKey functions.

casacore::Sort::Sort ( const Sort )

Copy constructor (copy semantics).

casacore::Sort::~Sort ( )

Member Function Documentation

void casacore::Sort::addKey ( const void *  data,
DataType  ,
uInt  nr,
int  options 
)
private

Add a sort key giving a data type or the sort key.

void casacore::Sort::addKey ( SortKey )
private
int casacore::Sort::compare ( uInt  index1,
uInt  index2 
) const
private

Compare the keys of 2 records.

void casacore::Sort::copy ( const Sort that)
private

Copy that Sort object to this.

uInt casacore::Sort::heapSort ( uInt  nr,
uInt indices 
) const
private

Do a heapsort, optionally skipping duplicates.

uInt casacore::Sort::heapSortNoDup ( uInt  nr,
uInt indices 
) const
private
uInt casacore::Sort::insSort ( uInt  nr,
uInt indices 
) const
private

Do an insertion sort, optionally skipping duplicates.

uInt casacore::Sort::insSortNoDup ( uInt  nr,
uInt indices 
) const
private
void casacore::Sort::merge ( uInt inx,
uInt tmp,
uInt  size,
uInt index,
uInt  nparts 
) const
private
Sort& casacore::Sort::operator= ( const Sort )

Assignment (copy semantics).

uInt casacore::Sort::parSort ( int  nthr,
uInt  nrrec,
uInt inx 
) const
private

Do a merge sort, if possible in parallel using OpenMP.

Note that the env.var. OMP_NUM_TRHEADS sets the maximum nr of threads to use. It defaults to the number of cores.

void casacore::Sort::qkSort ( Int  nr,
uInt indices 
) const
private
uInt casacore::Sort::quickSort ( uInt  nr,
uInt indices 
) const
private

Do a quicksort, optionally skipping duplicates (qkSort is the actual quicksort function).

uInt casacore::Sort::quickSortNoDup ( uInt  nr,
uInt indices 
) const
private
void casacore::Sort::siftDown ( Int  low,
Int  up,
uInt indices 
) const
private

Siftdown algorithm for heapsort.

uInt casacore::Sort::sort ( Vector< uInt > &  indexVector,
uInt  nrrec,
int  options = DefaultSort,
Bool  tryGenSort = True 
) const

Sort the data array of nrrec records.

The result is an array of indices giving the requested order. It returns the number of resulting records. The indices array is resized to that number.
By default it'll try if the faster GenSortIndirect can be used if a sort on a single key is used.

Referenced by casa::ScantableFieldIterator::ScantableFieldIterator(), casa::ScantableFrequenciesIterator::ScantableFrequenciesIterator(), and casa::ScantableSourceIterator::ScantableSourceIterator().

void casacore::Sort::sortKey ( const void *  data,
DataType  ,
uInt  increment = 0,
Order  = Ascending 
)

Define a sort key (the most significant key should be defined first).

The key contains:

  • A pointer to the start of the data array. — When structs are sorted on an element in the struct, the pointer must point to that element in the first struct.
  • A pointer to the comparison object to be used. — The comparison object can be specified in two ways:
    • by giving a basic data type, in which case the appropriate comparison object will be created automatically, or
    • by a CountedPtr of a comparison object. You may want to use the templated comparison classes ObjCompare (), but you are free to use any other class derived from BaseCompare that implements the comp function.
  • The increment from one data element to the next. — When structs are sorted on an element in the struct, the increment should be the size of the struct. If the comparison object is automatically created using the data type specified, the default increment is the size of the data type.
  • The sort order. — Ascending (default) or Descending;

When the data array has been passed to the Sort constructor, the data pointer and the increment arguments can be replaced by a single argument: the offset of the key in each element of the array.

Referenced by casa::ScantableFieldIterator::ScantableFieldIterator(), casa::ScantableFrequenciesIterator::ScantableFrequenciesIterator(), and casa::ScantableSourceIterator::ScantableSourceIterator().

void casacore::Sort::sortKey ( const void *  data,
const CountedPtr< BaseCompare > &  ,
uInt  increment,
Order  = Ascending 
)
void casacore::Sort::sortKey ( uInt  offset,
DataType  ,
Order  = Ascending 
)
void casacore::Sort::sortKey ( uInt  offset,
const CountedPtr< BaseCompare > &  ,
Order  = Ascending 
)
void casacore::Sort::swap ( Int  index1,
Int  index2,
uInt indices 
) const
inlineprivate

Swap 2 indices.

Definition at line 407 of file Sort.h.

uInt casacore::Sort::unique ( Vector< uInt > &  uniqueVector,
uInt  nrrec 
) const

Get all unique records in a sorted array.

The array order is given in the indexVector (as possibly returned by the sort function). The default indexVector is 0..nrrec-1. The index of each first unique record is returned in the uniqueVector. They are indices in the supplied indexVector, so data[indexVector(uniqueVector(i))] is giving the i-th unique record. Note that the records indexed by indexVector(uniqueVector(i)) till indexVector(uniqueVector(i+1)) are all the same.
It returns the number of unique records. The unique array is resized to that number.

uInt casacore::Sort::unique ( Vector< uInt > &  uniqueVector,
const Vector< uInt > &  indexVector 
) const

Member Data Documentation

const void* casacore::Sort::data_p
private

Definition at line 400 of file Sort.h.

PtrBlock<SortKey*> casacore::Sort::keys_p
private

Definition at line 398 of file Sort.h.

uInt casacore::Sort::nrkey_p
private

Definition at line 399 of file Sort.h.

int casacore::Sort::order_p
private

Definition at line 402 of file Sort.h.

uInt casacore::Sort::size_p
private

Definition at line 401 of file Sort.h.


The documentation for this class was generated from the following file: