IncrementalStMan.h

Classes

IncrementalStMan -- The Incremental Storage Manager (full description)

class IncrementalStMan : public ISMBase

Interface

Public Members
explicit IncrementalStMan (uInt bucketSize = 0, Bool checkBucketSize = True, uInt cacheSize = 1)
explicit IncrementalStMan (const String& dataManagerName, uInt bucketSize = 0, Bool checkBucketSize = True, uInt cacheSize = 1)
~IncrementalStMan()
Private Members
IncrementalStMan (const IncrementalStMan& that)
IncrementalStMan& operator= (const IncrementalStMan& that)

Description

Review Status

Programs:
Tests:

Prerequisite

Etymology

IncrementalStMan is the data manager storing values in an incremental way (similar to an incremental backup). A value is only stored when it differs from the previous value.

Synopsis

IncrementalStMan stores the data in a way that a value is only stored when it is different from the value in the previous row. This storage manager is very well suited for columns with slowly changing values, because the resulting file can be much smaller. It is not suited at all for columns with continuously changing data.

In general it can be advantageous to use this storage manager when a value changes at most every 4 rows (although it depends on the length of the data values themselves). The following simple example shows the approximate savings that can be achieved when storing a column with double values changing every CH rows.

   #rows    CH     normal length      ISM length      compress ratio
   50000     5        4000000          1606000               2.5
   50000    50        4000000           164000              24.5
   50000   500        4000000            32800             122
There is a special test program nISMBucket in the Tables module doing a simple, but usually adequate, simulation of the amount of storage needed for a scenario.

IncrementalStMan stores the values (and associated indices) in fixed-length buckets. A BucketCache object is used to read/write the buckets. The default cache size is 1 bucket (which is fine for sequential access), but for random access it can make sense to increase the size of the cache. This can be done using the class ROIncrementalStManAccessor.

The IncrementalStMan can hold values of any standard data type (thus from Bool to String). It can handle scalars, direct and indirect arrays. It can support an arbitrary number of columns. The values in each of them can vary at its own speed.
A bucket contains the values of several consecutive rows. At the beginning of a bucket the values of the starting row of all columns for this storage manager are repeated. In this way the value of a cell can always be found in the bucket and no references to previous buckets are needed.
A bucket should be big enough to hold all starting values and a reasonable number of other values. As a rule of thumb it should be big enough to hold at least 100 values of each column. In general the default bucket size will do. Only in special cases (e.g. when storing large variable length strings) the bucket size should be set explicitly. Giving a zero bucket size means that a suitale default bucket size will be calculated.
When a table is filled sequentially each bucket can be filled as much as possible. When writing in a random way, buckets can contain some unused space, because a bucket in the middle of the file has to be split when a new value has to be put in it.

Each column in the IncrementalStMan has the following properties to achieve the "store-different-values-only" behaviour.

This class contains many public functions which are only used by other ISM classes. The only useful function for the user is the constructor.

Motivation

IncrementalStMan can save a lot of storage space. Unlike the old StManMirAIO it stores the values directly in the file to save on memory usage.

Example

This example shows how to create a table and how to attach the storage manager to some columns.
   SetupNewTable newtab("name.data", tableDesc, Table::New);
   IncrementalStMan stman;                  // define storage manager
   newtab.bindColumn ("column1", stman);    // bind column to st.man.
   newtab.bindColumn ("column2", stman);    // bind column to st.man.
   Table tab(newtab);                       // actually create table

Member Description

explicit IncrementalStMan (uInt bucketSize = 0, Bool checkBucketSize = True, uInt cacheSize = 1)
explicit IncrementalStMan (const String& dataManagerName, uInt bucketSize = 0, Bool checkBucketSize = True, uInt cacheSize = 1)

Create an incremental storage manager with the given name. If no name is used, it is set to an empty string. The name can be used to construct a ROIncrementalStManAccessor object (e.g. to set the cache size).
The bucket size has to be given in bytes and the cache size in buckets. Bucket size 0 means that the storage manager will set the bucket size such that it can contain about 100 rows (with a minimum size of 32768 bytes). However, if that results in a very large bucket size (>327680) it'll make it smaller. Note it uses 32 bytes for the size of variable length strings, so this heuristic may fail when a column contains large strings. When checkBucketSize is set and Bucket size > 0 the storage manager throws an exception when the size is too small to hold the values of at least 2 rows. For this check it uses 0 for the length of variable length strings.

~IncrementalStMan()

IncrementalStMan (const IncrementalStMan& that)

Copy constructor cannot be used.

IncrementalStMan& operator= (const IncrementalStMan& that)

Assignment cannot be used.