casa  $Rev:20696$
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Defines
Public Member Functions | Static Public Member Functions | Private Member Functions | Private Attributes
casa::ISMBucket Class Reference

A bucket in the Incremental Storage Manager. More...

#include <ISMBucket.h>

List of all members.

Public Member Functions

 ISMBucket (ISMBase *parent, const char *bucketStorage)
 Create a bucket with the given parent.
 ~ISMBucket ()
uInt getInterval (uInt colnr, uInt rownr, uInt bucketNrrow, uInt &start, uInt &end, uInt &offset) const
 Get the row-interval for given column and row.
Bool canAddData (uInt leng) const
 Is the bucket large enough to add a value?
void addData (uInt colnr, uInt rownr, uInt index, const char *data, uInt leng)
 Add the data to the data part.
Bool canReplaceData (uInt newLeng, uInt oldLeng) const
 Is the bucket large enough to replace a value?
void replaceData (uInt &offset, const char *data, uInt newLeng, uInt fixedLength)
 Replace a data item.
const char * get (uInt offset) const
 Get a pointer to the data for the given offset.
uInt getLength (uInt fixedLength, const char *data) const
 Get the length of the data value.
uIntgetOffset (uInt colnr, uInt rownr)
 Get access to the offset of the data for given column and row.
Block< uInt > & rowIndex (uInt colnr)
 Get access to the index information for the given column.
Block< uInt > & offIndex (uInt colnr)
 Return the offsets of the values stored in the data part.
uIntindexUsed (uInt colnr)
 Return the number of values stored.
uInt split (ISMBucket *&left, ISMBucket *&right, Block< Bool > &duplicated, uInt bucketStartRow, uInt bucketNrrow, uInt colnr, uInt rownr, uInt lengToAdd)
 Split the bucket in the middle.
Bool simpleSplit (ISMBucket *left, ISMBucket *right, Block< Bool > &duplicated, uInt &splitRownr, uInt rownr)
 Determine whether a simple split is possible.
uInt getSplit (uInt totLeng, const Block< uInt > &rowLeng, const Block< uInt > &cumLeng)
 Return the index where the bucket should be split to get two parts with almost identical length.
void shiftLeft (uInt index, uInt nr, Block< uInt > &rowIndex, Block< uInt > &offIndex, uInt &nused, uInt leng)
 Remove nr items from data and index part by shifting to the left.
void copy (const ISMBucket &that)
 Copy the contents of that bucket to this bucket.
void show (ostream &os) const
 Show the layout of the bucket.

Static Public Member Functions

static char * readCallBack (void *owner, const char *bucketStorage)
 Callback function when BucketCache reads a bucket.
static void writeCallBack (void *owner, char *bucketStorage, const char *bucket)
 Callback function when BucketCache writes a bucket.
static char * initCallBack (void *owner)
 Callback function when BucketCache adds a new bucket to the data file.
static void deleteCallBack (void *, char *bucket)
 Callback function when BucketCache removes a bucket from the cache.

Private Member Functions

 ISMBucket (const ISMBucket &)
 Forbid copy constructor.
ISMBucketoperator= (const ISMBucket &)
 Forbid assignment.
void removeData (uInt offset, uInt leng)
 Remove a data item with the given length.
uInt insertData (const char *data, uInt leng)
 Insert a data value by appending it to the end.
uInt copyData (ISMBucket &other, uInt colnr, uInt toRownr, uInt fromIndex, uInt toIndex) const
 Copy a data item from this bucket to the other bucket.
void read (const char *bucketStorage)
 Read the data from the storage into this bucket.
void write (char *bucketStorage) const
 Write the bucket into the storage.

Private Attributes

ISMBasestmanPtr_p
 Pointer to the parent storage manager.
uInt uIntSize_p
 The size (in bytes) of an uInt (used in index, etc.).
uInt dataLeng_p
 The size (in bytes) of the data.
uInt indexLeng_p
 The size (in bytes) of the index.
PtrBlock< Block< uInt > * > rowIndex_p
 The row index per column; each index contains the row number of each value stored in the bucket (for that column).
PtrBlock< Block< uInt > * > offIndex_p
 The offset index per column; each index contains the offset (in bytes) of each value stored in the bucket (for that column).
Block< uIntindexUsed_p
 Nr of used elements in each index; i.e.
char * data_p
 The data space (in external (e.g.

Detailed Description

A bucket in the Incremental Storage Manager.

Intended use:

Internal

Review Status

Reviewed By:
UNKNOWN
Date Reviewed:
before2004/08/25

Prerequisite

Etymology

ISMBucket represents a bucket in the Incremental Storage Manager.

Synopsis

The Incremental Storage Manager uses a BucketCache object to read/write/cache the buckets containing the data. An ISMBucket object is the internal representation of the contents of a bucket. ISMBucket contains static callback functions which are called by BucketCache when reading/writing a bucket. These callback functions do the mapping of bucket data to ISMBucket object and vice-versa.

A bucket contains the values of several rows of all columns bound to this Incremental Storage Manager. A bucket is split into a data part and an index part. Each part has an arbitrary length but together they do not exceed the fixed bucket length.

The beginning of the data part contains the values of all columns bound. The remainder of the data part contains the values of the rows/columns with a changed value.
The index part contains an index per column. Each index contains the row number and an offset for a row with a stored value. The row numbers are relative to the beginning of the bucket, so the bucket has no knowledge about the absolute row numbers. In this way deletion of rows is much simpler.

The contents of a bucket looks like:

       -------------------------------------------------------------------
       | index offset   | data part     | index part              | free |
       -------------------------------------------------------------------
        0                4               4+length(data part)
       <--------------------------bucketsize----------------------------->

The data part contains all data value belonging to the bucket. The index part contains for each column the following data:

       -----------------------------------------------------------------------
       | \#values stored | row numbers of values | offset in data part of     |
       | for column i   | stored for column i   | values stored for column i |
       -----------------------------------------------------------------------
        0                4                       4+4*nrval

Note that the row numbers in the bucket start at 0, thus are relative to the beginning of the bucket. The main index kept in ISMIndex knows the starting row of each bucket. In this way bucket splitiing and especially row removal is much easier.

The bucket can be stored in canonical or local (i.e. native) data format. When a bucket is read into memory, its data are read, converted, and stored in the ISMBucket object. When flushed, the contents are written. ISMBucket takes care that the values stored in its object do not exceed the size of the bucket. When full, the user can call a function to split it into a left and right bucket. When the new value has to be written at the end, the split merely consist of creating a new bucket. In any case, care is taken that a row is not split. Thus a row is always entirely contained in one bucket.

Class ISMColumn does the actual writing of data in a bucket and uses the relevant ISMBucket functions.

Motivation

ISMBucket encapsulates the data of a bucket.

Definition at line 132 of file ISMBucket.h.


Constructor & Destructor Documentation

casa::ISMBucket::ISMBucket ( ISMBase parent,
const char *  bucketStorage 
)

Create a bucket with the given parent.

When bucketStorage is non-zero, reconstruct the object from it. It keeps the pointer to its parent (but does not own it).

casa::ISMBucket::ISMBucket ( const ISMBucket ) [private]

Forbid copy constructor.


Member Function Documentation

void casa::ISMBucket::addData ( uInt  colnr,
uInt  rownr,
uInt  index,
const char *  data,
uInt  leng 
)

Add the data to the data part.

It updates the bucket index at the given index. An exception is thrown if the bucket is too small.

Is the bucket large enough to add a value?

Bool casa::ISMBucket::canReplaceData ( uInt  newLeng,
uInt  oldLeng 
) const

Is the bucket large enough to replace a value?

void casa::ISMBucket::copy ( const ISMBucket that)

Copy the contents of that bucket to this bucket.

This is used after a split operation.

uInt casa::ISMBucket::copyData ( ISMBucket other,
uInt  colnr,
uInt  toRownr,
uInt  fromIndex,
uInt  toIndex 
) const [private]

Copy a data item from this bucket to the other bucket.

static void casa::ISMBucket::deleteCallBack ( void *  ,
char *  bucket 
) [static]

Callback function when BucketCache removes a bucket from the cache.

This function dletes the ISMBucket bucket object.

const char * casa::ISMBucket::get ( uInt  offset) const [inline]

Get a pointer to the data for the given offset.

Definition at line 310 of file ISMBucket.h.

References data_p.

uInt casa::ISMBucket::getInterval ( uInt  colnr,
uInt  rownr,
uInt  bucketNrrow,
uInt start,
uInt end,
uInt offset 
) const

Get the row-interval for given column and row.

It sets the start and end of the interval to which the row belongs and the offset of its current value. It returns the index where the row number can be put in the bucket index.

uInt casa::ISMBucket::getLength ( uInt  fixedLength,
const char *  data 
) const

Get the length of the data value.

It is fixedLength when non-zero, otherwise read it from the data value.

uInt& casa::ISMBucket::getOffset ( uInt  colnr,
uInt  rownr 
)

Get access to the offset of the data for given column and row.

It allows to change it (used for example by replaceData).

uInt casa::ISMBucket::getSplit ( uInt  totLeng,
const Block< uInt > &  rowLeng,
const Block< uInt > &  cumLeng 
)

Return the index where the bucket should be split to get two parts with almost identical length.

uInt & casa::ISMBucket::indexUsed ( uInt  colnr) [inline]

Return the number of values stored.

Definition at line 322 of file ISMBucket.h.

References indexUsed_p.

static char* casa::ISMBucket::initCallBack ( void *  owner) [static]

Callback function when BucketCache adds a new bucket to the data file.

This function creates an empty ISMBucket object. It returns the pointer to ISMBucket object which gets part of the cache. The object gets deleted by the deleteCallBack function.

uInt casa::ISMBucket::insertData ( const char *  data,
uInt  leng 
) [private]

Insert a data value by appending it to the end.

It returns the offset of the data value.

Block< uInt > & casa::ISMBucket::offIndex ( uInt  colnr) [inline]

Return the offsets of the values stored in the data part.

Definition at line 318 of file ISMBucket.h.

References offIndex_p.

ISMBucket& casa::ISMBucket::operator= ( const ISMBucket ) [private]

Forbid assignment.

void casa::ISMBucket::read ( const char *  bucketStorage) [private]

Read the data from the storage into this bucket.

static char* casa::ISMBucket::readCallBack ( void *  owner,
const char *  bucketStorage 
) [static]

Callback function when BucketCache reads a bucket.

It creates an ISMBucket object and converts the raw bucketStorage to that object. It returns the pointer to ISMBucket object which gets part of the cache. The object gets deleted by the deleteCallBack function.

void casa::ISMBucket::removeData ( uInt  offset,
uInt  leng 
) [private]

Remove a data item with the given length.

If the length is zero, its variable length is read first.

void casa::ISMBucket::replaceData ( uInt offset,
const char *  data,
uInt  newLeng,
uInt  fixedLength 
)

Replace a data item.

When its length is variable (indicated by fixedLength=0), the old value will be removed and the new one appended at the end. An exception is thrown if the bucket is too small.

Block< uInt > & casa::ISMBucket::rowIndex ( uInt  colnr) [inline]

Get access to the index information for the given column.

This is used by ISMColumn when putting the data.

Return the row numbers with a stored value.

Definition at line 314 of file ISMBucket.h.

References rowIndex_p.

void casa::ISMBucket::shiftLeft ( uInt  index,
uInt  nr,
Block< uInt > &  rowIndex,
Block< uInt > &  offIndex,
uInt nused,
uInt  leng 
)

Remove nr items from data and index part by shifting to the left.

The rowIndex, offIndex, and nused get updated. The caller is responsible for removing data when needed (e.g. ISMIndColumn removes the indirect arrays from its file).

void casa::ISMBucket::show ( ostream &  os) const

Show the layout of the bucket.

Bool casa::ISMBucket::simpleSplit ( ISMBucket left,
ISMBucket right,
Block< Bool > &  duplicated,
uInt splitRownr,
uInt  rownr 
)

Determine whether a simple split is possible.

If so, do it. This is possible if the new row is at the end of the last bucket, which will often be the case.
A simple split means adding a new bucket for the new row. If the old bucket already contains values for that row, those values are moved to the new bucket.
This fuction is only called by split, which created the left and right bucket.

uInt casa::ISMBucket::split ( ISMBucket *&  left,
ISMBucket *&  right,
Block< Bool > &  duplicated,
uInt  bucketStartRow,
uInt  bucketNrrow,
uInt  colnr,
uInt  rownr,
uInt  lengToAdd 
)

Split the bucket in the middle.

It returns the row number where the bucket was split and the new left and right bucket. The caller is responsible for deleting the newly created buckets. When possible a simple split is done.
The starting values in the right bucket may be copies of the values in the left bucket. The duplicated Block contains a switch per column indicating if the value is copied.

void casa::ISMBucket::write ( char *  bucketStorage) const [private]

Write the bucket into the storage.

static void casa::ISMBucket::writeCallBack ( void *  owner,
char *  bucketStorage,
const char *  bucket 
) [static]

Callback function when BucketCache writes a bucket.

It converts the ISMBucket bucket object to the raw bucketStorage.


Member Data Documentation

char* casa::ISMBucket::data_p [private]

The data space (in external (e.g.

canonical) format).

Definition at line 306 of file ISMBucket.h.

Referenced by get().

The size (in bytes) of the data.

Definition at line 293 of file ISMBucket.h.

The size (in bytes) of the index.

Definition at line 295 of file ISMBucket.h.

Nr of used elements in each index; i.e.

the number of stored values per column.

Definition at line 304 of file ISMBucket.h.

Referenced by indexUsed().

The offset index per column; each index contains the offset (in bytes) of each value stored in the bucket (for that column).

Definition at line 301 of file ISMBucket.h.

Referenced by offIndex().

The row index per column; each index contains the row number of each value stored in the bucket (for that column).

Definition at line 298 of file ISMBucket.h.

Referenced by rowIndex().

Pointer to the parent storage manager.

Definition at line 289 of file ISMBucket.h.

The size (in bytes) of an uInt (used in index, etc.).

Definition at line 291 of file ISMBucket.h.


The documentation for this class was generated from the following file: