A bucket contains the values of several rows of all columns bound to this Incremental Storage Manager. A bucket is split into a data part and an index part. Each part has an arbitrary length but together they do not exceed the fixed bucket length.
The beginning of the data part contains the values of all columns
bound. The remainder of the data part contains the values of
the rows/columns with a changed value.
The index part contains an index per column. Each index contains the
row number and an offset for a row with a stored value. The row numbers
are relative to the beginning of the bucket, so the bucket has
no knowledge about the absolute row numbers. In this way deletion of
rows is much simpler.
The contents of a bucket looks like:
------------------------------------------------------------------- | index offset | data part | index part | free | ------------------------------------------------------------------- 0 4 4+length(data part) <--------------------------bucketsize-----------------------------> The data part contains all data value belonging to the bucket. The index part contains for each column the following data:----------------------------------------------------------------------- | #values stored | row numbers of values | offset in data part of | | for column i | stored for column i | values stored for column i | ----------------------------------------------------------------------- 0 4 4+4*nrvalNote that the row numbers in the bucket start at 0, thus are relative to the beginning of the bucket. The main index kept in ISMIndex knows the starting row of each bucket. In this way bucket splitiing and especially row removal is much easier.The bucket can be stored in canonical or local (i.e. native) data format. When a bucket is read into memory, its data are read, converted, and stored in the ISMBucket object. When flushed, the contents are written. ISMBucket takes care that the values stored in its object do not exceed the size of the bucket. When full, the user can call a function to split it into a left and right bucket. When the new value has to be written at the end, the split merely consist of creating a new bucket. In any case, care is taken that a row is not split. Thus a row is always entirely contained in one bucket.
Class ISMColumn does the actual writing of data in a bucket and uses the relevant ISMBucket functions.
Motivation
ISMBucket encapsulates the data of a bucket.
Member Description
ISMBucket (ISMBase* parent, const char* bucketStorage)
Create a bucket with the given parent. When bucketStorage is non-zero, reconstruct the object from it. It keeps the pointer to its parent (but does not own it).~ISMBucket()
uInt getInterval (uInt colnr, uInt rownr, uInt bucketNrrow, uInt& start, uInt& end, uInt& offset) const
Get the row-interval for given column and row. It sets the start and end of the interval to which the row belongs and the offset of its current value. It returns the index where the row number can be put in the bucket index.
Bool canAddData (uInt leng) const
Is the bucket large enough to add a value?
void addData (uInt colnr, uInt rownr, uInt index, const char* data, uInt leng)
Add the data to the data part. It updates the bucket index at the given index. An exception is thrown if the bucket is too small.
Bool canReplaceData (uInt newLeng, uInt oldLeng) const
Is the bucket large enough to replace a value?
void replaceData (uInt& offset, const char* data, uInt newLeng, uInt fixedLength)
Replace a data item. When its length is variable (indicated by fixedLength=0), the old value will be removed and the new one appended at the end. An exception is thrown if the bucket is too small.
const char* get (uInt offset) const
Get a pointer to the data for the given offset.
uInt getLength (uInt fixedLength, const char* data) const
Get the length of the data value. It is fixedLength when non-zero, otherwise read it from the data value.
uInt& getOffset (uInt colnr, uInt rownr)
Get access to the offset of the data for given column and row. It allows to change it (used for example by replaceData).
Block<uInt>& rowIndex (uInt colnr)
Get access to the index information for the given column. This is used by ISMColumn when putting the data.
Return the row numbers with a stored value.
Block<uInt>& offIndex (uInt colnr)
Get access to the index information for the given column. This is used by ISMColumn when putting the data.
Return the offsets of the values stored in the data part.
uInt& indexUsed (uInt colnr)
Get access to the index information for the given column. This is used by ISMColumn when putting the data.
Return the number of values stored.
uInt split (ISMBucket*& left, ISMBucket*& right, Block<Bool>& duplicated, uInt bucketStartRow, uInt bucketNrrow, uInt colnr, uInt rownr, uInt lengToAdd)
Split the bucket in the middle. It returns the row number where the bucket was split and the new left and right bucket. The caller is responsible for deleting the newly created buckets. When possible a simple split is done.
The starting values in the right bucket may be copies of the values in the left bucket. The duplicated Block contains a switch per column indicating if the value is copied.Bool simpleSplit (ISMBucket* left, ISMBucket* right, Block<Bool>& duplicated, uInt& splitRownr, uInt rownr)
Determine whether a simple split is possible. If so, do it. This is possible if the new row is at the end of the last bucket, which will often be the case.
A simple split means adding a new bucket for the new row. If the old bucket already contains values for that row, those values are moved to the new bucket.
This fuction is only called by split, which created the left and right bucket.uInt getSplit (uInt totLeng, const Block<uInt>& rowLeng, const Block<uInt>& cumLeng)
Return the index where the bucket should be split to get two parts with almost identical length.
void shiftLeft (uInt index, uInt nr, Block<uInt>& rowIndex, Block<uInt>& offIndex, uInt& nused, uInt leng)
Remove nr items from data and index part by shifting to the left. The rowIndex, offIndex, and nused get updated. The caller is responsible for removing data when needed (e.g. ISMIndColumn removes the indirect arrays from its file).
void copy (const ISMBucket& that)
Copy the contents of that bucket to this bucket. This is used after a split operation.
static char* readCallBack (void* owner, const char* bucketStorage)
Callback function when BucketCache reads a bucket. It creates an ISMBucket object and converts the raw bucketStorage to that object. It returns the pointer to ISMBucket object which gets part of the cache. The object gets deleted by the deleteCallBack function.
static void writeCallBack (void* owner, char* bucketStorage, const char* bucket)
Callback function when BucketCache writes a bucket. It converts the ISMBucket bucket object to the raw bucketStorage.
static char* initCallBack (void* owner)
Callback function when BucketCache adds a new bucket to the data file. This function creates an empty ISMBucket object. It returns the pointer to ISMBucket object which gets part of the cache. The object gets deleted by the deleteCallBack function.
static void deleteCallBack (void*, char* bucket)
Callback function when BucketCache removes a bucket from the cache. This function dletes the ISMBucket bucket object.
void show (ostream& os) const
Show the layout of the bucket.
ISMBucket (const ISMBucket&)
Forbid copy constructor.ISMBucket& operator= (const ISMBucket&)
Forbid assignment.
void removeData (uInt offset, uInt leng)
Remove a data item with the given length. If the length is zero, its variable length is read first.
uInt insertData (const char* data, uInt leng)
Insert a data value by appending it to the end. It returns the offset of the data value.
uInt copyData (ISMBucket& other, uInt colnr, uInt toRownr, uInt fromIndex, uInt toIndex) const
Copy a data item from this bucket to the other bucket.
void read (const char* bucketStorage)
Read the data from the storage into this bucket.
void write (char* bucketStorage) const
Write the bucket into the storage.