PagedArray.h

Classes

PagedArray -- A Lattice that is read from or written to disk. (full description)

template <class T> class PagedArray : public Lattice<T>

Interface

Public Members
PagedArray()
PagedArray (const TiledShape& shape, const String& filename)
explicit PagedArray (const TiledShape& shape)
PagedArray (const TiledShape& shape, Table& file)
PagedArray (const TiledShape& shape, Table& file, const String& columnName, uInt rowNum)
explicit PagedArray (const String& filename)
explicit PagedArray (Table& file)
PagedArray (Table& file, const String& columnName, uInt rowNum)
PagedArray (const PagedArray<T>& other)
~PagedArray()
PagedArray<T>& operator= (const PagedArray<T>& other)
virtual Lattice<T>* clone() const
virtual Bool isPersistent() const
virtual Bool isPaged() const
virtual Bool isWritable() const
virtual IPosition shape() const
virtual String name (Bool stripPath=False) const
void resize (const TiledShape& newShape)
const String& tableName() const
Table& table()
const Table& table() const
const String& columnName() const
static String defaultColumn()
const ROTiledStManAccessor& accessor() const
uInt rowNumber() const
static uInt defaultRow()
IPosition tileShape() const
virtual uInt advisedMaxPixels() const
virtual void setMaximumCacheSize (uInt howManyPixels)
virtual uInt maximumCacheSize() const
virtual void setCacheSizeInTiles (uInt howManyTiles)
virtual void setCacheSizeFromPath (const IPosition& sliceShape, const IPosition& windowStart, const IPosition& windowLength, const IPosition& axisPath)
virtual void clearCache()
virtual void showCacheStatistics (ostream& os) const
virtual T getAt (const IPosition& where) const
virtual void putAt (const T& value, const IPosition& where)
virtual Bool ok() const
virtual LatticeIterInterface<T>* makeIter (const T& navigator, Bool useRef) const
virtual Bool doGetSlice (Array<T>& buffer, const Slicer& section)
virtual void doPutSlice (const Array<T>& sourceBuffer, const IPosition& where, const IPosition& stride)
virtual IPosition doNiceCursorShape (uInt maxPixels) const
virtual Bool lock (FileLocker::LockType, uInt nattempts)
virtual void unlock()
virtual Bool hasLock (FileLocker::LockType) const
virtual void resync()
virtual void flush()
virtual void tempClose()
virtual void reopen()
Private Members
void setTableType()
void makeArray (const TiledShape& shape)
void makeTable (const String& filename, Table::TableOption option)
static String defaultComment()
ArrayColumn<T>& getRWArray()
void makeRWArray()
void doReopen() const
void tempReopen() const
See Also
ArrayLattice - a memory based Lattice.

Description

Review Status

Reviewed By:
Peter Barnes
Date Reviewed:
1999/10/30
Programs:
Demos:
Tests:

Prerequisite

Etymology

"Demand paging" is a technique used to implement virtual memory in computer operating systems. In this scheme, code or data are read from disk to memory only as needed by a process, and are read in fixed-sized chunks called "pages". PagedArrays are somewhat the same -- though without the automatic features found in virtual memory demand paging. However PagedArrays do allow the user to access chunks of the disk in a flexible way, that can match the requirements of many algorithms.

Synopsis

At the time of writing, typical scientific computers provide sufficient memory for storing and manipulating 2-dimensional astronomical images, which have average size of around 8 MBytes. Astronomy is increasingly using three or higher dimensional arrays, which can be larger by one or two orders of magnitude. PagedArrays provide a convenient way of accessing these large arrays without requiring all the data to be read into real or virtual memory.

When you construct a PagedArray you do not read any data into memory. Instead a disk file (ie. a Table) is created, in a place you specify, to hold the data. This means you need to have enough disk space to hold the array. Constructing a PagedArray is equivalent to opening a file.

Because the data is stored on disk it can be saved after the program, function, or task that created the PagedArray has finished. This saved array can then be read again at a later stage.

So there are two reasons for using a PagedArray:

  1. To provide for arrays that are too large for the computer's memory.
  2. To provide a way of saving arrays to disk for later access.

To access the data in a PagedArray you can either:

  1. Use a LatticeIterator
  2. Use the getSlice and putSlice member functions
  3. Use the parenthesis operator or getAt and putAt functions
These access methods are given in order of preference. Some examples of these access methods are in the documentation for the Lattice class as well as below.

In nearly all cases you access the PagedArray by reading a "slice" of the PagedArray into an AIPS++ Array. Because the slice is stored in memory it is important that the slice you read is not too big compared to the physical memory on your computer. Otherwise your computer will page excessively and performance will be poor.

To overcome this you may be tempted to access the PagedArray a pixel at a time. This will use little memory but the overhead of accessing a large data set by separately reading each pixel from disk will also lead to poor performance.

In general the best way to access the data in PagedArrays is to use a LatticeIterator with a cursor size that "fits" nicely into memory. Not only do the LaticeIterator classes provide a relatively simple way to read/write all the data but they optimally set up the cache that is associated with each PagedArray.

If the LatticeIterator classes do not access the data the way you want you can use the getSlice and putSlice member functions. These functions do not set up the cache for you and improved performance may be obtained by tweaking the cache using the setCacheSizeFromPath member frunction.

More Details

In order to utilise PagedArrays fully and understand many of the member functions and data access methods in this class, you need to be familiar with some of the concepts involved in the implementation of PagedArrays.

Each PagedArray is stored in one cell of a Table as an indirect Array (see the documentation for the Tables module for more information). This means that multiple PagedArrays can be stored in one Table. To specify which PagedArray you are referring to in a given Table you need to specify the cell using its column name and row number during construction. If a cell is not specified the default column name (as given by the defaultColumnName function) and row number (as given by the defaultRowNumber function) are used. This ability to store multiple PagedArrays's is used in the PagedImage class where the image is stored in one cell and a mask is optionally stored in a another column in the same row.

There are currently a number of limitations when storing multiple PagedArrays in the same Table.

Each PagedArray is stored on disk using the tiled cell storage manager (TiledCellStMan). This stores the data in tiles which are regular subsections of the PagedArray. For example a PagedArray of shape [1024,1024,4,128] may have a tile shape of [32,16,4,16]. The data in each tile is stored as a unit on the disk. This means that there is no preferred axis when accessing multi-dimensional data.
The tile shape can be specified when constructing a new PagedArray but not when reading an old one as it is intrinsic to the way the data is stored on disk. It is NOT recommended that you specify the tile shape unless you can control the lifetime of the PagedArray (this includes the time it spends on disk), or can guarantee the access pattern. For example if you know that a PagedArray of shape [512,512,4,32] will always be sliced plane by plane you may prefer to specify a tile shape of [512,64,1,1] rather than the default of [32,16,4,16].
Tiles can be cached by the tile storage manager so that it does not need to read the data from disk every time you are accessing the a pixel in a different tile. In order to cache the correct tiles you should tell the storage manager what section of the PagedArray you will be accessing. This is done using the setCacheSizeFromPath member function. Alternatively you can set the size of the cache using the setCacheSizeInTiles member function.
By default there is no limit on how much memory the tile cache can consume. This can be changed using the setMaximumCacheSize member function. The tiled storage manager always tries to cache enough tiles to ensure that each tile is read from disk only once, so setting the maximum cache size will trade off memory usage for disk I/O. Setting the cache size is illustrated in example 5 below.
The showCacheStatistics member function is provided to allow you to evaluate the performance of the tile cache.

Example

All the examples in this section are available in dPagedArray.cc

Example 1:

Create a PagedArray of Floats of shape [1024,1024,4,256] in a file called "myData_tmp.array" and initialize it to zero. This will create a directory on disk called "myData_tmp.array" that contains files that exceed 1024*1024*4*256*4 (= 4 GBytes) in size.
    const IPosition arrayShape(4,1024,1024,4,256);
    const String filename("myData_tmp.array");
    PagedArray<Float> diskArray(arrayShape, filename);
    cout << "Created a PagedArray of shape " << diskArray.shape() 
      << " (" << diskArray.shape().product()/1024/1024*sizeof(Float) 
      << " MBytes)" << endl
      << "in the table called " << diskArray.tableName() << endl;
    diskArray.set(0.0f);
    // Using the set function is an efficient way to initialize the PagedArray
    // as it uses a PagedArrIter internally. Note that the set function is
    // defined in the Lattice class that PagedArray is derived from. 
    

Example 2:

Read the PagedArray produced in Example 1 and put a Gaussian profile into each spectral channel.
    PagedArray<Float> diskArray("myData_tmp.array");
    IPosition shape = diskArray.shape();
    // Construct a Gaussian Profile to be 10 channels wide and centred on
    // channel 16. Its height is 1.0.
    Gaussian1D<Float> g(1.0f, 16.0f, 10.0f);
    // Create a vector to cache a sampled version of this profile.
    Vector<Float> profile(shape(3));
    indgen(profile);
    profile.apply(g);
    // Now put this profile into every spectral channel in the paged array. This
    // is best done using an iterator.
    LatticeIterator<Float> iter(diskArray, 
                             TiledLineStepper(shape, diskArray.tileShape(), 3));
    for (iter.reset(); !iter.atEnd(); iter++) {
       iter.woCursor() = profile;
    }
    

Example 3:

Now multiply the I-polarization data by 10.0 in this PagedArray. The I-polarization data occupies 1 GByte of RAM which is too big to read into the memory of most computers. So an iterator is used to get suitable sized chunks.
    Table t("myData_tmp.array", Table::Update);
    PagedArray<Float> da(t);
    const IPosition latticeShape = da.shape();
    const nx = latticeShape(0);
    const ny = latticeShape(1);
    const npol = latticeShape(2);
    const nchan = latticeShape(3);
    IPosition cursorShape = da.niceCursorShape();
    cursorShape(2) = 1;
    LatticeStepper step(latticeShape, cursorShape);
    step.subSection(IPosition(4,0), IPosition(4,nx-1,ny-1,0,nchan-1));
    LatticeIterator<Float> iter(da, step);
    for (iter.reset(); !iter.atEnd(); iter++) {
       iter.rwCursor() *= 10.0f;
    }
    

Example 4:

Use a direct call to getSlice to access a small central region of the V-polarization in spectral channel 0 only. The region is small enough to not warrant constructing iterators and setting up LatticeNavigators. In this example the call to the getSlice function is unnecessary but is done for illustration purposes anyway.
    SetupNewTable maskSetup("mask_tmp.array", TableDesc(), Table::New);
    Table maskTable(maskSetup);
    PagedArray<Bool> maskArray(IPosition(4,1024,1024,4,256), maskTable);
    maskArray.set(False);
    COWPtr<Array<Bool> > maskPtr;
    maskArray.getSlice(maskPtr, IPosition(4,240,240,3,0),
    		      IPosition(4,32,32,1,1), IPosition(4,1));
    maskPtr.rwRef() = True;
    maskArray.putSlice(*maskPtr, IPosition(4,240,240,3,1));
    

Example 5:

In this example the data in the PagedArray will be accessed a row at a time while setting the cache size to different values. The comments illustrate the results when running on an Ultra 1/140 with 64MBytes of memory.
    PagedArray<Float> pa(IPosition(4,128,128,4,32));
    const IPosition latticeShape = pa.shape();
    cout << "The tile shape is:" << pa.tileShape() << endl;
    // The tile shape is:[32, 16, 4, 16]
      
    // Setup to access the PagedArray a row at a time
    const IPosition sliceShape(4,latticeShape(0), 1, 1, 1);
    const IPosition stride(4,1);
    Array<Float> row(sliceShape);
    IPosition start(4, 0);
      
    // Set the cache size to enough pixels for one tile only. This uses
    // 128kBytes of cache memory and takes 125 secs.
    pa.setCacheSizeInTiles (1);
    Timer clock;
    for (start(3) = 0; start(3) < latticeShape(3); start(3)++) {
      for (start(2) = 0; start(2) < latticeShape(2); start(2)++) {
        for (start(1) = 0; start(1) < latticeShape(1); start(1)++) {
          pa.getSlice(row,  start, sliceShape, stride);
        }
      }
    }
    clock.show();
    pa.showCacheStatistics(cout);
    pa.clearCache();
      
    // Set the cache size to enough pixels for one row of tiles (ie. 4).
    // This uses 512 kBytes of cache memory and takes 10 secs.
    pa.setCacheSizeInTiles (4);
    clock.mark();
    for (start(3) = 0; start(3) < latticeShape(3); start(3)++) {
      for (start(2) = 0; start(2) < latticeShape(2); start(2)++) {
        for (start(1) = 0; start(1) < latticeShape(1); start(1)++) {
          pa.getSlice(row,  start, sliceShape, stride);
        }
      }
    }
    clock.show();
    pa.showCacheStatistics(cout);
    pa.clearCache();
      
    // Set the cache size to enough pixels for one plane of tiles
    // (ie. 4*8). This uses 4 MBytes of cache memory and takes 2 secs.
    pa.setCacheSizeInTiles (4*8);
    clock.mark();
    for (start(3) = 0; start(3) < latticeShape(3); start(3)++) {
      for (start(2) = 0; start(2) < latticeShape(2); start(2)++) {
        for (start(1) = 0; start(1) < latticeShape(1); start(1)++) {
          pa.getSlice(row,  start, sliceShape, stride);
        }
      }
    }
    clock.show();
    pa.showCacheStatistics(cout);
    pa.clearCache();
    

Motivation

Arrays of data are sometimes much too large to hold in random access memory. PagedArrays, especially in combination with LatticeIterator, provide convenient access to such large data sets.

Template Type Argument Requirements (T)

To Do

Member Description

PagedArray()

The default constructor creates a PagedArray that is useless for just about everything, except that it can be assigned to with the assignment operator.

PagedArray (const TiledShape& shape, const String& filename)

Construct a new PagedArray with the specified shape. A new Table with the specified filename is constructed to hold the array. The Table will remain on disk after the PagedArray goes out of scope or is deleted.

explicit PagedArray (const TiledShape& shape)

Construct a new PagedArray with the specified shape. A scratch Table is created in the current working directory to hold the array. This Table will be deleted automatically when the PagedArray goes out of scope or is deleted.

PagedArray (const TiledShape& shape, Table& file)

Construct a new PagedArray, with the specified shape, in the default row and column of the supplied Table.

PagedArray (const TiledShape& shape, Table& file, const String& columnName, uInt rowNum)

Construct a new PagedArray, with the specified shape, in the specified row and column of the supplied Table.

explicit PagedArray (const String& filename)

Reconstruct from a pre-existing PagedArray in the default row and column of the supplied Table with the supplied filename.

explicit PagedArray (Table& file)

Reconstruct from a pre-existing PagedArray in the default row and column of the supplied Table.

PagedArray (Table& file, const String& columnName, uInt rowNum)

Reconstruct from a pre-existing PagedArray in the specified row and column of the supplied Table.

PagedArray (const PagedArray<T>& other)

The copy constructor which uses reference semantics. Copying by value doesn't make sense, because it would require the creation of a temporary (but possibly huge) file on disk.

~PagedArray()

The destructor flushes the PagedArrays contents to disk.

PagedArray<T>& operator= (const PagedArray<T>& other)

The assignment operator with reference semantics. As with the copy constructor assigning by value does not make sense.

virtual Lattice<T>* clone() const

Make a copy of the object (reference semantics).

virtual Bool isPersistent() const

A PagedArray is always persistent.

virtual Bool isPaged() const

A PagedArray is always paged to disk.

virtual Bool isWritable() const

Is the PagedArray writable?

virtual IPosition shape() const

Returns the shape of the PagedArray.

virtual String name (Bool stripPath=False) const

Return the current Table name. By default this includes the full path. The path preceeding the file name can be stripped off on request.

void resize (const TiledShape& newShape)

Functions to resize the PagedArray. The old contents are lost. Usage of this function is NOT currently recommended (see the More Details section above).

const String& tableName() const

Returns the current table name (ie. filename) of this PagedArray.

Table& table()
const Table& table() const

Return the current table object.

const String& columnName() const

Returns the current Table column name of this PagedArray.

static String defaultColumn()

Returns the default TableColumn name for a PagedArray.

const ROTiledStManAccessor& accessor() const

Returns an accessor to the tiled storage manager.

uInt rowNumber() const

Returns the current row number of this PagedArray.

static uInt defaultRow()

Returns the default row number for a PagedArray.

IPosition tileShape() const

Returns the current tile shape for this PagedArray.

virtual uInt advisedMaxPixels() const

Returns the maximum recommended number of pixels for a cursor. This is the number of pixels in a tile.

virtual void setMaximumCacheSize (uInt howManyPixels)

Set the maximum allowed cache size for all Arrays in this column of the Table. The actual value used may be smaller. A value of zero means that there is no maximum.

virtual uInt maximumCacheSize() const

Return the maximum allowed cache size (in pixels) for all Arrays in this column of the Table. The actual cache size may be smaller. A value of zero means that no maximum is currently defined.

virtual void setCacheSizeInTiles (uInt howManyTiles)

Set the actual cache size for this Array to be big enough for the indicated number of tiles. This cache is not shared with PagedArrays in other rows and is always clipped to be less than the maximum value set using the setMaximumCacheSize member function. Tiles are cached using a first in first out algorithm.

virtual void setCacheSizeFromPath (const IPosition& sliceShape, const IPosition& windowStart, const IPosition& windowLength, const IPosition& axisPath)

Set the actual cache size for this Array to "fit" the indicated path. This cache is not shared with PagedArrays in other rows and is always less than the maximum value. The sliceShape is the cursor or slice that you will be requiring (with each call to {get,put}Slice). The windowStart and windowLength delimit the range of pixels that will ultimatly be accessed. The AxisPath is described in the documentation for the LatticeStepper class.

virtual void clearCache()

Clears and frees up the tile cache. The maximum allowed cache size is unchanged from when setMaximumCacheSize was last called.

virtual void showCacheStatistics (ostream& os) const

Generate a report on how the cache is doing. This is reset every time clearCache is called.

virtual T getAt (const IPosition& where) const

Return the value of the single element located at the argument IPosition. Note that Lattice::operator() can also be used.

virtual void putAt (const T& value, const IPosition& where)

Put the value of a single element.

virtual Bool ok() const

A function which checks for internal consistency. Returns False if something nasty has happened to the PagedArray. In that case it also throws an exception.

virtual LatticeIterInterface<T>* makeIter (const T& navigator, Bool useRef) const

This function is used by the LatticeIterator class to generate an iterator of the correct type for a specified Lattice. Not recommended for general use.

virtual Bool doGetSlice (Array<T>& buffer, const Slicer& section)

Do the actual getting of an array of values.

virtual void doPutSlice (const Array<T>& sourceBuffer, const IPosition& where, const IPosition& stride)

Do the actual getting of an array of values.

virtual IPosition doNiceCursorShape (uInt maxPixels) const

Get the best cursor shape.

virtual Bool lock (FileLocker::LockType, uInt nattempts)
virtual void unlock()
virtual Bool hasLock (FileLocker::LockType) const

Handle the (un)locking.

virtual void resync()

Resynchronize the PagedArray object with the lattice file. This function is only useful if no read-locking is used, ie. if the table lock option is UserNoReadLocking or AutoNoReadLocking. In that cases the table system does not acquire a read-lock, thus does not synchronize itself automatically.

virtual void flush()

Flush the data (but do not unlock).

virtual void tempClose()

Temporarily close the lattice. It will be reopened automatically on the next access.

virtual void reopen()

Explicitly reopen the temporarily closed lattice.

void setTableType()

Set the data in the TableInfo file

void makeArray (const TiledShape& shape)

make the ArrayColumn

void makeTable (const String& filename, Table::TableOption option)

Make a Table to hold this PagedArray

static String defaultComment()

The default comment for PagedArray Colums

ArrayColumn<T>& getRWArray()

Get the writable ArrayColumn object. It is created when needed.

void makeRWArray()

Create the writable ArrayColumn object. It reopens the table for write when needed.

void doReopen() const
void tempReopen() const

Do the reopen of the table (if not open already).