casa
$Rev:20696$
|
A bucket in the Incremental Storage Manager. More...
#include <ISMBucket.h>
Public Member Functions | |
ISMBucket (ISMBase *parent, const char *bucketStorage) | |
Create a bucket with the given parent. | |
~ISMBucket () | |
uInt | getInterval (uInt colnr, uInt rownr, uInt bucketNrrow, uInt &start, uInt &end, uInt &offset) const |
Get the row-interval for given column and row. | |
Bool | canAddData (uInt leng) const |
Is the bucket large enough to add a value? | |
void | addData (uInt colnr, uInt rownr, uInt index, const char *data, uInt leng) |
Add the data to the data part. | |
Bool | canReplaceData (uInt newLeng, uInt oldLeng) const |
Is the bucket large enough to replace a value? | |
void | replaceData (uInt &offset, const char *data, uInt newLeng, uInt fixedLength) |
Replace a data item. | |
const char * | get (uInt offset) const |
Get a pointer to the data for the given offset. | |
uInt | getLength (uInt fixedLength, const char *data) const |
Get the length of the data value. | |
uInt & | getOffset (uInt colnr, uInt rownr) |
Get access to the offset of the data for given column and row. | |
Block< uInt > & | rowIndex (uInt colnr) |
Get access to the index information for the given column. | |
Block< uInt > & | offIndex (uInt colnr) |
Return the offsets of the values stored in the data part. | |
uInt & | indexUsed (uInt colnr) |
Return the number of values stored. | |
uInt | split (ISMBucket *&left, ISMBucket *&right, Block< Bool > &duplicated, uInt bucketStartRow, uInt bucketNrrow, uInt colnr, uInt rownr, uInt lengToAdd) |
Split the bucket in the middle. | |
Bool | simpleSplit (ISMBucket *left, ISMBucket *right, Block< Bool > &duplicated, uInt &splitRownr, uInt rownr) |
Determine whether a simple split is possible. | |
uInt | getSplit (uInt totLeng, const Block< uInt > &rowLeng, const Block< uInt > &cumLeng) |
Return the index where the bucket should be split to get two parts with almost identical length. | |
void | shiftLeft (uInt index, uInt nr, Block< uInt > &rowIndex, Block< uInt > &offIndex, uInt &nused, uInt leng) |
Remove nr items from data and index part by shifting to the left. | |
void | copy (const ISMBucket &that) |
Copy the contents of that bucket to this bucket. | |
void | show (ostream &os) const |
Show the layout of the bucket. | |
Static Public Member Functions | |
static char * | readCallBack (void *owner, const char *bucketStorage) |
Callback function when BucketCache reads a bucket. | |
static void | writeCallBack (void *owner, char *bucketStorage, const char *bucket) |
Callback function when BucketCache writes a bucket. | |
static char * | initCallBack (void *owner) |
Callback function when BucketCache adds a new bucket to the data file. | |
static void | deleteCallBack (void *, char *bucket) |
Callback function when BucketCache removes a bucket from the cache. | |
Private Member Functions | |
ISMBucket (const ISMBucket &) | |
Forbid copy constructor. | |
ISMBucket & | operator= (const ISMBucket &) |
Forbid assignment. | |
void | removeData (uInt offset, uInt leng) |
Remove a data item with the given length. | |
uInt | insertData (const char *data, uInt leng) |
Insert a data value by appending it to the end. | |
uInt | copyData (ISMBucket &other, uInt colnr, uInt toRownr, uInt fromIndex, uInt toIndex) const |
Copy a data item from this bucket to the other bucket. | |
void | read (const char *bucketStorage) |
Read the data from the storage into this bucket. | |
void | write (char *bucketStorage) const |
Write the bucket into the storage. | |
Private Attributes | |
ISMBase * | stmanPtr_p |
Pointer to the parent storage manager. | |
uInt | uIntSize_p |
The size (in bytes) of an uInt (used in index, etc.). | |
uInt | dataLeng_p |
The size (in bytes) of the data. | |
uInt | indexLeng_p |
The size (in bytes) of the index. | |
PtrBlock< Block< uInt > * > | rowIndex_p |
The row index per column; each index contains the row number of each value stored in the bucket (for that column). | |
PtrBlock< Block< uInt > * > | offIndex_p |
The offset index per column; each index contains the offset (in bytes) of each value stored in the bucket (for that column). | |
Block< uInt > | indexUsed_p |
Nr of used elements in each index; i.e. | |
char * | data_p |
The data space (in external (e.g. |
A bucket in the Incremental Storage Manager.
Internal
ISMBucket represents a bucket in the Incremental Storage Manager.
The Incremental Storage Manager uses a BucketCache object to read/write/cache the buckets containing the data. An ISMBucket
object is the internal representation of the contents of a bucket. ISMBucket
contains static callback functions which are called by BucketCache
when reading/writing a bucket. These callback functions do the mapping of bucket data to ISMBucket
object and vice-versa.
A bucket contains the values of several rows of all columns bound to this Incremental Storage Manager. A bucket is split into a data part and an index part. Each part has an arbitrary length but together they do not exceed the fixed bucket length.
The beginning of the data part contains the values of all columns bound. The remainder of the data part contains the values of the rows/columns with a changed value.
The index part contains an index per column. Each index contains the row number and an offset for a row with a stored value. The row numbers are relative to the beginning of the bucket, so the bucket has no knowledge about the absolute row numbers. In this way deletion of rows is much simpler.
The contents of a bucket looks like:
------------------------------------------------------------------- | index offset | data part | index part | free | ------------------------------------------------------------------- 0 4 4+length(data part) <--------------------------bucketsize----------------------------->
The data part contains all data value belonging to the bucket. The index part contains for each column the following data:
----------------------------------------------------------------------- | \#values stored | row numbers of values | offset in data part of | | for column i | stored for column i | values stored for column i | ----------------------------------------------------------------------- 0 4 4+4*nrval
Note that the row numbers in the bucket start at 0, thus are relative to the beginning of the bucket. The main index kept in ISMIndex knows the starting row of each bucket. In this way bucket splitiing and especially row removal is much easier.
The bucket can be stored in canonical or local (i.e. native) data format. When a bucket is read into memory, its data are read, converted, and stored in the ISMBucket object. When flushed, the contents are written. ISMBucket takes care that the values stored in its object do not exceed the size of the bucket. When full, the user can call a function to split it into a left and right bucket. When the new value has to be written at the end, the split merely consist of creating a new bucket. In any case, care is taken that a row is not split. Thus a row is always entirely contained in one bucket.
Class ISMColumn does the actual writing of data in a bucket and uses the relevant ISMBucket functions.
ISMBucket encapsulates the data of a bucket.
Definition at line 132 of file ISMBucket.h.
casa::ISMBucket::ISMBucket | ( | ISMBase * | parent, |
const char * | bucketStorage | ||
) |
Create a bucket with the given parent.
When bucketStorage
is non-zero, reconstruct the object from it. It keeps the pointer to its parent (but does not own it).
casa::ISMBucket::ISMBucket | ( | const ISMBucket & | ) | [private] |
Forbid copy constructor.
void casa::ISMBucket::addData | ( | uInt | colnr, |
uInt | rownr, | ||
uInt | index, | ||
const char * | data, | ||
uInt | leng | ||
) |
Add the data to the data part.
It updates the bucket index at the given index. An exception is thrown if the bucket is too small.
Bool casa::ISMBucket::canAddData | ( | uInt | leng | ) | const |
Is the bucket large enough to add a value?
Bool casa::ISMBucket::canReplaceData | ( | uInt | newLeng, |
uInt | oldLeng | ||
) | const |
Is the bucket large enough to replace a value?
void casa::ISMBucket::copy | ( | const ISMBucket & | that | ) |
Copy the contents of that bucket to this bucket.
This is used after a split operation.
uInt casa::ISMBucket::copyData | ( | ISMBucket & | other, |
uInt | colnr, | ||
uInt | toRownr, | ||
uInt | fromIndex, | ||
uInt | toIndex | ||
) | const [private] |
Copy a data item from this bucket to the other bucket.
static void casa::ISMBucket::deleteCallBack | ( | void * | , |
char * | bucket | ||
) | [static] |
Callback function when BucketCache removes a bucket from the cache.
This function dletes the ISMBucket bucket object.
const char * casa::ISMBucket::get | ( | uInt | offset | ) | const [inline] |
Get a pointer to the data for the given offset.
Definition at line 310 of file ISMBucket.h.
References data_p.
uInt casa::ISMBucket::getInterval | ( | uInt | colnr, |
uInt | rownr, | ||
uInt | bucketNrrow, | ||
uInt & | start, | ||
uInt & | end, | ||
uInt & | offset | ||
) | const |
Get the row-interval for given column and row.
It sets the start and end of the interval to which the row belongs and the offset of its current value. It returns the index where the row number can be put in the bucket index.
uInt casa::ISMBucket::getLength | ( | uInt | fixedLength, |
const char * | data | ||
) | const |
Get the length of the data value.
It is fixedLength
when non-zero, otherwise read it from the data value.
uInt& casa::ISMBucket::getOffset | ( | uInt | colnr, |
uInt | rownr | ||
) |
Get access to the offset of the data for given column and row.
It allows to change it (used for example by replaceData).
uInt casa::ISMBucket::getSplit | ( | uInt | totLeng, |
const Block< uInt > & | rowLeng, | ||
const Block< uInt > & | cumLeng | ||
) |
Return the index where the bucket should be split to get two parts with almost identical length.
uInt & casa::ISMBucket::indexUsed | ( | uInt | colnr | ) | [inline] |
Return the number of values stored.
Definition at line 322 of file ISMBucket.h.
References indexUsed_p.
static char* casa::ISMBucket::initCallBack | ( | void * | owner | ) | [static] |
Callback function when BucketCache adds a new bucket to the data file.
This function creates an empty ISMBucket object. It returns the pointer to ISMBucket object which gets part of the cache. The object gets deleted by the deleteCallBack function.
uInt casa::ISMBucket::insertData | ( | const char * | data, |
uInt | leng | ||
) | [private] |
Insert a data value by appending it to the end.
It returns the offset of the data value.
Block< uInt > & casa::ISMBucket::offIndex | ( | uInt | colnr | ) | [inline] |
Return the offsets of the values stored in the data part.
Definition at line 318 of file ISMBucket.h.
References offIndex_p.
void casa::ISMBucket::read | ( | const char * | bucketStorage | ) | [private] |
Read the data from the storage into this bucket.
static char* casa::ISMBucket::readCallBack | ( | void * | owner, |
const char * | bucketStorage | ||
) | [static] |
Callback function when BucketCache reads a bucket.
It creates an ISMBucket object and converts the raw bucketStorage to that object. It returns the pointer to ISMBucket object which gets part of the cache. The object gets deleted by the deleteCallBack function.
void casa::ISMBucket::removeData | ( | uInt | offset, |
uInt | leng | ||
) | [private] |
Remove a data item with the given length.
If the length is zero, its variable length is read first.
void casa::ISMBucket::replaceData | ( | uInt & | offset, |
const char * | data, | ||
uInt | newLeng, | ||
uInt | fixedLength | ||
) |
Replace a data item.
When its length is variable (indicated by fixedLength=0), the old value will be removed and the new one appended at the end. An exception is thrown if the bucket is too small.
Block< uInt > & casa::ISMBucket::rowIndex | ( | uInt | colnr | ) | [inline] |
Get access to the index information for the given column.
This is used by ISMColumn when putting the data.
Return the row numbers with a stored value.
Definition at line 314 of file ISMBucket.h.
References rowIndex_p.
void casa::ISMBucket::shiftLeft | ( | uInt | index, |
uInt | nr, | ||
Block< uInt > & | rowIndex, | ||
Block< uInt > & | offIndex, | ||
uInt & | nused, | ||
uInt | leng | ||
) |
Remove nr
items from data and index part by shifting to the left.
The rowIndex
, offIndex
, and nused
get updated. The caller is responsible for removing data when needed (e.g. ISMIndColumn
removes the indirect arrays from its file).
void casa::ISMBucket::show | ( | ostream & | os | ) | const |
Show the layout of the bucket.
Bool casa::ISMBucket::simpleSplit | ( | ISMBucket * | left, |
ISMBucket * | right, | ||
Block< Bool > & | duplicated, | ||
uInt & | splitRownr, | ||
uInt | rownr | ||
) |
Determine whether a simple split is possible.
If so, do it. This is possible if the new row is at the end of the last bucket, which will often be the case.
A simple split means adding a new bucket for the new row. If the old bucket already contains values for that row, those values are moved to the new bucket.
This fuction is only called by split, which created the left and right bucket.
uInt casa::ISMBucket::split | ( | ISMBucket *& | left, |
ISMBucket *& | right, | ||
Block< Bool > & | duplicated, | ||
uInt | bucketStartRow, | ||
uInt | bucketNrrow, | ||
uInt | colnr, | ||
uInt | rownr, | ||
uInt | lengToAdd | ||
) |
Split the bucket in the middle.
It returns the row number where the bucket was split and the new left and right bucket. The caller is responsible for deleting the newly created buckets. When possible a simple split is done.
The starting values in the right bucket may be copies of the values in the left bucket. The duplicated Block contains a switch per column indicating if the value is copied.
void casa::ISMBucket::write | ( | char * | bucketStorage | ) | const [private] |
Write the bucket into the storage.
static void casa::ISMBucket::writeCallBack | ( | void * | owner, |
char * | bucketStorage, | ||
const char * | bucket | ||
) | [static] |
Callback function when BucketCache writes a bucket.
It converts the ISMBucket bucket object to the raw bucketStorage.
char* casa::ISMBucket::data_p [private] |
The data space (in external (e.g.
canonical) format).
Definition at line 306 of file ISMBucket.h.
Referenced by get().
uInt casa::ISMBucket::dataLeng_p [private] |
The size (in bytes) of the data.
Definition at line 293 of file ISMBucket.h.
uInt casa::ISMBucket::indexLeng_p [private] |
The size (in bytes) of the index.
Definition at line 295 of file ISMBucket.h.
Block<uInt> casa::ISMBucket::indexUsed_p [private] |
Nr of used elements in each index; i.e.
the number of stored values per column.
Definition at line 304 of file ISMBucket.h.
Referenced by indexUsed().
PtrBlock<Block<uInt>*> casa::ISMBucket::offIndex_p [private] |
The offset index per column; each index contains the offset (in bytes) of each value stored in the bucket (for that column).
Definition at line 301 of file ISMBucket.h.
Referenced by offIndex().
PtrBlock<Block<uInt>*> casa::ISMBucket::rowIndex_p [private] |
The row index per column; each index contains the row number of each value stored in the bucket (for that column).
Definition at line 298 of file ISMBucket.h.
Referenced by rowIndex().
ISMBase* casa::ISMBucket::stmanPtr_p [private] |
Pointer to the parent storage manager.
Definition at line 289 of file ISMBucket.h.
uInt casa::ISMBucket::uIntSize_p [private] |
The size (in bytes) of an uInt (used in index, etc.).
Definition at line 291 of file ISMBucket.h.