BucketCache.h

Classes

Global Functions -- Define the type of the static read and write function. (full description)
BucketCache -- Cache for buckets in a part of a file (full description)

Define the type of the static read and write function. (source)

Interface

typedef void (*BucketCacheFromLocal) (void* ownerObject, char* canonical, const char* local)
typedef void (*BucketCacheDeleteBuffer) (void* ownerObject, char* buffer)

Description

Synopsis

The BucketCache class needs a way to convert its data from local to canonical format and vice-versa. This is done by callback functions defined at construction time.

The ToLocal callback function has to allocate a buffer of the correct size and to copy/convert the canonical data in the input buffer to this buffer. The pointer this newly allocated buffer has to be returned. The BucketCache class keeps this pointer in the cache block.

The FromLocal callback function has to copy/convert the data from the buffer in local format to the buffer in canonical format. It should NOT delete the buffer; that has to be done by the DeleteBuffer function.

The AddBuffer callback function has to create (and initialize) a buffer to be added to the file and cache. When the file gets extended, BucketCache only registers the new size, but does not werite anything. When a bucket is read between the actual file size and the new file size, the AddBuffer callback function is called to create a buffer and possibly initialize it.

The DeleteBuffer callback function has to delete the buffer allocated by the ToLocal function.

The functions get a pointer to the owner object, which was provided at construction time. The callback function has to cast this to the correct type and can use it thereafter.
C++ supports pointers to members, but it is a bit hard. Therefore pointers to static members are used (which are simple pointers to functions). A pointer to the owner object is also passed to let the static function call the correct member function (when needed).

Example

See class BucketCache.

Member Description

typedef void (*BucketCacheFromLocal) (void* ownerObject, char* canonical, const char* local)

typedef void (*BucketCacheDeleteBuffer) (void* ownerObject, char* buffer)


class BucketCache

Interface

Public Members
BucketCache (BucketFile* file, Int64 startOffset, uInt bucketSize, uInt nrOfBuckets, uInt cacheSize, void* ownerObject, BucketCacheToLocal readCallBack, BucketCacheFromLocal writeCallBack, BucketCacheAddBuffer addCallBack, BucketCacheDeleteBuffer deleteCallBack)
~BucketCache()
Bool flush (uInt fromSlot = 0)
void clear (uInt fromSlot = 0, Bool doFlush = True)
void resize (uInt cacheSize)
void resync (uInt nrBucket, uInt nrOfFreeBucket, Int firstFreeBucket)
uInt nBucket() const
uInt cacheSize() const
void setDirty()
char* getBucket (uInt bucketNr)
void extend (uInt nrBucket)
uInt addBucket (char* data)
void removeBucket()
void get (char* buf, uInt length, Int64 offset)
void put (const char* buf, uInt length, Int64 offset)
Int firstFreeBucket() const
uInt nFreeBucket() const
void initStatistics()
void showStatistics (ostream& os) const
Private Members
BucketCache (const BucketCache&)
BucketCache& operator= (const BucketCache&)
void setLRU()
void getSlot (uInt bucketNr)
void writeBucket (uInt slotNr)
void readBucket (uInt slotNr)
void initializeBuckets (uInt bucketNr)
void checkOffset (uInt length, Int64 offset) const

Description

Prerequisite

Etymology

BucketCache implements a cache for buckets in (a part of) a file.

Synopsis

A cache may allow more efficient quasi-random IO. It can, for instance, be used when a limited number of blocks in a file have to be accessed again and again.

The class BucketCache provides such a cache. It can be used on a consecutive part of a file as long as that part is not simultaneously accessed in another way (including another BucketCache object).

BucketCache stores the data as given. It uses callback functions to allocate/delete buffers and to convert the data to/from local format.

When a new bucket is needed and all slots in the cache are used, BucketCache will remove the least recently used bucket from the cache. When the dirty flag is set, it will first be written.

BucketCache maintains a list of free buckets. Initially this list is empty. When a bucket is removed, it is added to the free list. AddBucket will take buckets from the free list before extending the file.

Since it is possible to handle only a part of a file by a BucketCache object, it is also possible to have multiple BucketCache objects on the same file (as long as they access disjoint parts of the file). Each BucketCache object can have its own bucket size. This can, for example, be used to have tiled arrays with different tile shapes in the same file.

Statistics are kept to know how efficient the cache is working. It is possible to initialize and show the statistics.

Motivation

A cache may reduce IO traffix considerably. Furthermore it is more efficient to keep a cache in local format. In that way conversion to/from local only have to be done when data gets read/written. It also allows for precalculations.

Example

  // Define the callback function for reading a bucket.
  char* bToLocal (void*, const char* data)
  {
    char* ptr = new char[32768];
    memcpy (ptr, data, 32768);
    return ptr;
  }
  // Define the callback function for writing a bucket.
  void bFromLocal (void*, char* data, const char* local)
  {
    memcpy (data, local, 32768);
  }
  // Define the callback function for initializing a new bucket.
  char* bAddBuffer (void*)
  {
    char* ptr = new char[32768];
    for (uInt i=0; i++; i<32768) {
	ptr[i] = 0;
    }
    return ptr;
  }
  // Define the callback function for deleting a bucket.
  void bDeleteBuffer (void*, char* buffer)
  {
    delete [] buffer;
  }

  void someFunc()
  {
    // Open the filebuf.
    BucketFile file(...);
    file.open();
    uInt i;
    // Create a cache for the part of the file starting at offset 512
    // consisting of 1000 buckets. The cache consists of 10 buckets.
    // Each bucket is 32768 bytes.
    BucketCache cache (&file, 512, 32768, 1000, 10, 0,
                       bToLocal, bFromLocal, bAddBuffer, bDeleteBuffer);
    // Write all buckets into the file.
    for (i=0; i<100; i++) {
      char* buf = new char[32768];
      cache.addBucket (buf);
    }
    Flush the cache to write all buckets in it.
    cache.flush();
    // Read all buckets from the file.
    for (i=0; i<1000; i++) {
      char* buf = cache.getBucket(i);
      ...
    }
    cout << cache.nBucket() << endl;
  }

To Do

Member Description

BucketCache (BucketFile* file, Int64 startOffset, uInt bucketSize, uInt nrOfBuckets, uInt cacheSize, void* ownerObject, BucketCacheToLocal readCallBack, BucketCacheFromLocal writeCallBack, BucketCacheAddBuffer addCallBack, BucketCacheDeleteBuffer deleteCallBack)

Create the cache for (a part of) a file. The file part used starts at startOffset. Its length is bucketSize*nrOfBuckets bytes. When the file is smaller, the remainder is indicated as an extension similarly to the behaviour of function extend.

~BucketCache()

Bool flush (uInt fromSlot = 0)

Flush the cache from the given slot on. By default the entire cache is flushed. When the entire cache is flushed, possible remaining uninitialized buckets will be initialized first. A True status is returned when buckets had to be written.

void clear (uInt fromSlot = 0, Bool doFlush = True)

Clear the cache from the given slot on. By default the entire cache is cleared. It will remove the buckets in the cleared part. If wanted and needed, the buckets are flushed to the file before removing them. It can be used to enforce rereading buckets from the file.

void resize (uInt cacheSize)

Resize the cache. When the cache gets smaller, the latter buckets are cached out. It does not take "least recently used" into account.

void resync (uInt nrBucket, uInt nrOfFreeBucket, Int firstFreeBucket)

Resynchronize the object (after another process updated the file). It clears the cache (so all data will be reread) and sets the new sizes.

uInt nBucket() const

Get the current nr of buckets in the file.

uInt cacheSize() const

Get the current cache size (in buckets).

void setDirty()

Set the dirty bit for the current bucket.

char* getBucket (uInt bucketNr)

Make another bucket current. When no more cache slots are available, the one least recently used is flushed. The data in the bucket is converted using the ToLocal callback function. When the bucket does not exist yet in the file, it gets added and initialized using the AddBuffer callback function. A pointer to the data in converted format is returned.

void extend (uInt nrBucket)

Extend the file with the given number of buckets. The buckets get initialized when they are acquired (using getBucket) for the first time.

uInt addBucket (char* data)

Add a bucket to the file and make it the current one. When no more cache slots are available, the one least recently used is flushed.
When no free buckets are available, the file will be extended with one bucket. It returns the new bucket number. The buffer must have been allocated on the heap. It will get part of the cache; its contents are not copied. Thus the buffer should hereafter NOT be used for other purposes. It will be deleted later via the DeleteBuffer callback function. The data is copied into the bucket. A pointer to the data in local format is returned.

void removeBucket()

Remove the current bucket; i.e. add it to the beginning of the free bucket list.

void get (char* buf, uInt length, Int64 offset)

Get a part from the file outside the cached area. It is checked if that part is indeed outside the cached file area.

void put (const char* buf, uInt length, Int64 offset)

Put a part from the file outside the cached area. It is checked if that part is indeed outside the cached file area.

Int firstFreeBucket() const

Get the bucket number of the first free bucket. -1 = no free buckets.

uInt nFreeBucket() const

Get the number of free buckets.

void initStatistics()

(Re)initialize the cache statistics.

void showStatistics (ostream& os) const

Show the statistics.

BucketCache (const BucketCache&)

Copy constructor is not possible.

BucketCache& operator= (const BucketCache&)

Assignment is not possible.

void setLRU()

Set the LRU information for the current slot.

void getSlot (uInt bucketNr)

Get a cache slot for the bucket.

void writeBucket (uInt slotNr)

Write a bucket.

void readBucket (uInt slotNr)

Read a bucket.

void initializeBuckets (uInt bucketNr)

Initialize the bucket buffer. The uninitialized buckets before this bucket are also initialized. It returns a pointer to the buffer.

void checkOffset (uInt length, Int64 offset) const

Check if the offset of a non-cached part is correct.