Compressed Arrays¶
zfp‘s compressed arrays are C++ classes that implement random-accessible single- and multi-dimensional floating-point arrays whose storage size, specified in number of bits per array element, is set by the user. Such arbitrary storage is achieved via zfp‘s lossy fixed-rate compression mode, by partitioning each d-dimensional array into blocks of 4d values and compressing each block to a fixed number of bits. The more smoothly the array values vary along each dimension, the more accurately zfp can represent them. In other words, these arrays are not suitable for representing data where adjacent elements are not correlated. Rather, the expectation is that the array represents a regularly sampled and predominantly continuous function, such as a temperature field in a physics simulation.
The rate, measured in number of bits per array element, can be specified in fractions of a bit (but see FAQs #12 and #18 for limitations). Note that array dimensions need not be multiples of four; zfp transparently handles partial blocks on array boundaries.
The C++ templated array classes are implemented entirely as header files that call the zfp C library to perform compression and decompression. These arrays cache decompressed blocks to reduce the number of compression and decompression calls. Whenever an array value is read, the corresponding block is first looked up in the cache, and if found the uncompressed value is returned. Otherwise the block is first decompressed and stored in the cache. Whenever an array element is written (whether actually modified or not), a “dirty bit” is set with its cached block to indicate that the block must be compressed back to persistent storage when evicted from the cache.
This section documents the public interface to the array classes, including base classes and member accessor classes like proxy references/pointers and iterators.
Array Classes¶
Currently there are six array classes for 1D, 2D, and 3D arrays, each of which can represent single- or double-precision values. Although these arrays store values in a form different from conventional single- and double-precision floating point, the user interacts with the arrays via floats and doubles.
The array classes can often serve as direct substitutes for C/C++
single- and multi-dimensional floating-point arrays and STL vectors, but
have the benefit of allowing fine control over storage size. All classes
below belong to the zfp
namespace.
Base Class¶
-
class
array
¶ Virtual base class for common array functionality.
-
array::
array
(const array &a)¶ Copy constructor. Performs a deep copy. NOTE: Deep copies are not yet fully implemented.
-
array::
~array
()¶ Destructor.
-
array &
array::
operator=
(const array &a)¶ Assignment operator–performs a deep copy of the whole array.
-
double
array::
rate
() const¶ Return rate in bits per value.
-
double
array::
set_rate
(double rate)¶ Set desired compression rate in bits per value. Return the closest rate supported. See FAQ #12 and FAQ #18 for discussions of the rate granularity. This method destroys the previous contents of the array.
-
virtual void
array::
clear_cache
() const¶ Empty cache without compressing modified cached blocks, i.e. discard any cached updates to the array.
-
virtual void
array::
flush_cache
() const¶ Flush cache by compressing all modified cached blocks back to persistent storage and emptying the cache. This method should be called before writing the compressed representation of the array to disk, for instance.
-
size_t
array::
compressed_size
() const¶ Return number of bytes of storage for the compressed data. This amount does not include the small overhead of other class members or the size of the cache. Rather, it reflects the size of the memory buffer returned by
compressed_data()
.
-
uchar *
array::
compressed_data
() const¶ Return pointer to compressed data for read or write access. The size of the buffer is given by
compressed_size()
.
Common Methods¶
The following methods are common to 1D, 2D, and 3D arrays, but are implemented in the array class specific to each dimensionality rather than in the base class.
-
size_t
array::
size
() const¶ Total number of elements in array, e.g. nx × ny × nz for 3D arrays.
-
size_t
array::
cache_size
() const¶ Return the cache size in number of bytes.
-
void
array::
set_cache_size
(size_t csize)¶ Set minimum cache size in bytes. The actual size is always a power of two bytes and consists of at least one block. If csize is zero, then a default cache size is used, which requires the array dimensions to be known.
-
void
array::
get
(Scalar *p) const¶ Decompress entire array and store at p, for which sufficient storage must have been allocated. The uncompressed array is assumed to be contiguous (with default strides) and stored in the usual “row-major” order, i.e. with x varying faster than y and y varying faster than z.
-
void
array::
set
(const Scalar *p)¶ Initialize array by copying and compressing data stored at p. The uncompressed data is assumed to be stored as in the
get()
method.
-
const Scalar &
array::
operator[]
(uint index) const¶ Return scalar stored at given flat index (inspector). For a 3D array,
index = x + nx * (y + ny * z)
.
-
reference
array::
operator[]
(uint index)¶ Return proxy reference to scalar stored at given flat index (mutator). For a 3D array,
index = x + nx * (y + ny * z)
.
-
iterator
array::
begin
()¶ Return iterator to beginning of array.
-
iterator
array::
end
()¶ Return iterator to end of array. As with STL iterators, the end points to a virtual element just past the last valid array element.
1D, 2D, and 3D Arrays¶
Below are classes and methods specific to each array dimensionality. Since the classes and methods share obvious similarities regardless of dimensionality, only one generic description for all dimensionalities is provided.
-
template<typename
Scalar
>
classarray3
: public array¶ This is a 1D/2D/3D array that inherits basic functionality from the generic
array
base class. The template argument,Scalar
, specifies the floating type returned for array elements. The suffixesf
andd
can also be appended to each class to indicate float or double type, e.g.array1f
is a synonym forarray1<float>
.
-
array1::
array1
()¶
-
array2::
array2
()¶
-
array3::
array3
()¶ Default constructor. Creates an empty array.
-
array1::
array1
(uint n, double rate, const Scalar *p = 0, size_t csize = 0)¶
-
array2::
array2
(uint nx, uint ny, double rate, const Scalar *p = 0, size_t csize = 0)¶
-
array3::
array3
(uint nx, uint ny, uint nz, double rate, const Scalar *p = 0, size_t csize = 0)¶ Constructor of array with dimensions n (1D), nx × ny (2D), or nx × ny × nz (3D) using rate bits per value, at least csize bytes of cache, and optionally initialized from flat, uncompressed array p. If csize is zero, a default cache size is chosen.
-
uint
array2::
size_x
() const¶
-
uint
array2::
size_y
() const¶
-
uint
array3::
size_x
() const¶
-
uint
array3::
size_y
() const¶
-
uint
array3::
size_z
() const¶ Return array dimensions.
-
void
array1::
resize
(uint n, bool clear = true)¶
-
void
array2::
resize
(uint nx, uint ny, bool clear = true)¶
-
void
array3::
resize
(uint nx, uint ny, uint nz, bool clear = true)¶ Resize the array (all previously stored data will be lost). If clear is true, then the array elements are all initialized to zero.
-
const Scalar &
array1::
operator()
(uint i) const¶
-
const Scalar &
array2::
operator()
(uint i, uint j) const¶
-
const Scalar &
array3::
operator()
(uint i, uint j, uint k) const¶ Return scalar stored at multi-dimensional index given by i, j, and k (inspector).
-
reference
array3::
operator()
(uint i, uint j, uint k)¶ Return proxy reference to scalar stored at multi-dimensional index given by i, j, and k (mutator).
Caching¶
As mentioned above, the array classes maintain a software write-back cache of at least one uncompressed block. When a block in this cache is evicted (e.g. due to a conflict), it is compressed back to permanent storage only if it was modified while stored in the cache.
The size cache to use is specified by the user, and is an important parameter that needs careful consideration in order to balance the extra memory usage, performance, and quality (recall that data loss is incurred only when a block is evicted from the cache and compressed). Although the best choice varies from one application to another, we suggest allocating at least two “layers” of blocks, e.g. 2 × (nx / 4) × (ny / 4) blocks for 3D arrays, for applications that stream through the array and perform stencil computations such as gathering data from neighboring elements. This allows limiting the cache misses to compulsory ones. If the csize parameter provided to the constructor is set to zero bytes, then this default of two layers is used.
The cache size can be set during construction, or can be set at a later
time via array::set_cache_size()
. Note that if csize = 0, then
the array dimensions must have already been specified for the default size
to be computed correctly. When the cache is resized, it is first flushed
if not already empty. The cache can also be flushed explicitly if desired
by calling array::flush_cache()
. To empty the cache without
compressing any cached data, call clear_cache()
. To query the
byte size of the cache, use array::cache_size()
.
By default a direct-mapped cache is used with a hash function that maps
block indices to cache lines. A faster but more collision prone hash
can be enabled by defining the preprocessor macro
ZFP_WITH_CACHE_FAST_HASH
.
A two-way skew-associative cache is enabled by defining the preprocessor
macro ZFP_WITH_CACHE_TWOWAY
.
References¶
-
template<typename
Scalar
>
classarray1::
reference
¶
-
template<typename
Scalar
>
classarray2::
reference
¶
-
template<typename
Scalar
>
classarray3::
reference
¶
Array indexing operators must return lvalue references that
alias array elements and serve as vehicles for assigning values to those
elements. Unfortunately, zfp cannot simply return a standard C++ reference
(e.g. float&
) to an uncompressed array element since the element in
question may exist only in compressed form or as a transient cached entry that
may be invalidated (evicted) at any point.
To address this, zfp provides proxies for references and pointers that act much like regular references and pointers, but which refer to elements by array and index rather than by memory address. When assigning to an array element through such a proxy reference or pointer, the corresponding element is decompressed to cache (if not already cached) and immediately updated.
zfp references may be freely passed to other functions and they remain valid during the lifetime of the corresponding array element. One may also take the address of a reference, which yields a proxy pointer. When a reference appears as an rvalue in an expression, it is implicitly converted to a value.
The following operators are defined for zfp references. They act on the referenced array element in the same manner as operators defined for conventional C++ references.
-
reference
reference::
operator=
(const reference &ref)¶ Assignment (copy) operator. The referenced element, elem, is assigned the value stored at the element referenced by ref. Return
*this
.
-
reference
reference::
operator=
(Scalar val)¶
-
reference
reference::
operator+=
(Scalar val)¶
-
reference
reference::
operator-=
(Scalar val)¶
-
reference
reference::
operator*=
(Scalar val)¶
-
reference
reference::
operator/=
(Scalar val)¶ Assignment and compound assignment operators. For a given operator
op
, update the referenced element, elem, via elemop
val. Return*this
.
-
pointer
reference::
operator&
()¶ Return pointer to the referenced array element.
Finally, zfp proxy references serve as a building block for implementing proxy pointers and iterators.
Pointers¶
-
template<typename
Scalar
>
classarray1::
pointer
¶
-
template<typename
Scalar
>
classarray2::
pointer
¶
-
template<typename
Scalar
>
classarray3::
pointer
¶
Similar to references, zfp supports proxies for pointers to individual array elements. From the user’s perspective, such pointers behave much like regular pointers to uncompressed data, e.g. instead of
float a[ny][nx]; // uncompressed 2D array of floats
float* p = &a[0][0]; // point to first array element
p[nx] = 1; // set a[1][0] = 1
*++p = 2; // set a[0][1] = 2
one would write
zfp::array2<float> a(nx, ny, rate); // compressed 2D array of floats
zfp::array2<float>::pointer p = &a(0, 0); // point to first array element
p[nx] = 1; // set a(0, 1) = 1
*++p = 2; // set a(1, 0) = 2
However, even though zfp‘s proxy pointers point to individual scalars, they are associated with the array that those scalars are stored in, including the array’s dimensionality. Pointers into arrays of different dimensionality have incompatible type. Moreover, pointers to elements in different arrays are incompatible. For example, one cannot take the difference between pointers into two different arrays.
Unlike zfp‘s proxy references, its proxy pointers support traversing
arrays using conventional pointer arithmetic. In particular, unlike the
iterators below, zfp‘s pointers are oblivious to the
fact that the compressed arrays are partitioned into blocks, and the pointers
traverse arrays element by element as though the arrays were flattened in
standard C row-major order. That is, if p
points to the first
element of a 3D array a(nx, ny, nz)
, then
a(i, j, k) == p[i + nx * (j + ny * k)]
. In other words, pointer
indexing follows the same order as flat array indexing
(see array::operator[]()
).
A pointer remains valid during the lifetime of the scalar that it points to.
Like conventional pointers, proxy pointers can be passed to other functions
and manipulated there, for instance by passing the pointer by reference via
pointer&
.
The following operators are defined for proxy pointers. Below p refers to the pointer being acted upon.
-
pointer
pointer::
operator=
(const pointer &q)¶ Assignment operator. Assigns q to p.
-
reference
pointer::
operator*
() const¶ Dereference operator. Return proxy reference to the value pointed to by p.
-
reference
pointer::
operator[]
(ptrdiff_t d) const¶ Index operator. Return reference to the value stored at
p[d]
.
-
pointer &
pointer::
operator++
()¶
-
pointer &
pointer::
operator--
()¶ Pre increment (decrement) pointer, e.g.
++p
. Return reference to the incremented (decremented) pointer.
-
pointer
pointer::
operator++
(int)¶
-
pointer
pointer::
operator--
(int)¶ Post increment (decrement) pointer, e.g.
p++
. Return a copy of the pointer before it was incremented (decremented).
-
pointer
pointer::
operator+=
(ptrdiff_t d)¶
-
pointer
pointer::
operator-=
(ptrdiff_t d)¶ Increment (decrement) pointer by d. Return a copy of the incremented (decremented) pointer.
-
pointer
pointer::
operator+
(ptrdiff_t d) const¶
-
pointer
pointer::
operator-
(ptrdiff_t d) const¶ Return a copy of the pointer incremented (decremented) by d.
-
ptrdiff_t
pointer::
operator-
(const pointer &q) const¶ Return difference p - q. Defined only for pointers within the same array.
-
bool
pointer::
operator==
(const pointer &q) const¶
-
bool
pointer::
operator!=
(const pointer &q) const¶ Pointer comparison. Return true (false) if p and q do (do not) point to the same array element.
Iterators¶
-
template<typename
Scalar
>
classarray1::
iterator
¶
-
template<typename
Scalar
>
classarray2::
iterator
¶
-
template<typename
Scalar
>
classarray3::
iterator
¶
Iterators provide a mechanism for sequentially traversing a possibly multi-dimensional array without having to track array indices or bounds. They are also the preferred mechanism, compared to nested index loops, for initializing arrays, because they are guaranteed to visit the array one block at a time. This allows all elements of a block to be initialized together and ensures that the block is not compressed to memory before it has been fully initialized, which might otherwise result in poor compression and, consequently, larger errors than when the entire block is initialized as a whole. Note that the iterator traversal order differs in this respect from traversal by pointers.
The order of blocks visited is row-major (as in C), and the elements within a block are also visited in row-major order, i.e. first by x, then by y, and finally by z. All 4d values in a block are visited before moving on to the next block.
The iterators provided by zfp are sequential forward iterators, except for 1D array iterators, which are random access iterators. The reason why higher dimensional iterators do not support random access is that this would require very complicated index computations, especially for arrays with partial blocks. zfp iterators are STL compliant and can be used in STL algorithms that support forward and random access iterators.
All Iterators¶
Per STL mandate, the iterators define several types:
-
type
iterator::
value_type
¶ The scalar type associated with the array that the iterator points into.
-
type
iterator::
difference_type
¶ Difference between two iterators in number of array elements.
-
type
iterator::
reference
¶
-
type
iterator::
pointer
¶ The reference and pointer type associated with the iterator’s parent array class.
-
type
iterator::
iterator_category
¶ Type of iterator:
std::random_access_iterator_tag
for 1D arrays;std::forward_iterator_tag
for all other arrays.
The following operations are defined on iterators:
-
iterator
iterator::
operator=
(const iterator &it)¶ Assignment (copy) operator. Make the iterator point to the same element as it.
-
reference
iterator::
operator*
() const¶ Dereference operator. Return reference to the value pointed to by the iterator.
-
iterator &
iterator::
operator++
()¶ Pre increment. Return a reference to the incremented iterator.
-
iterator
iterator::
operator++
(int)¶ Post increment. Return the value of the iterator before being incremented.
-
bool
iterator::
operator==
(const iterator &it) const¶
-
bool
iterator::
operator!=
(const iterator &it) const¶ Return true (false) if the two iterators do (do not) point to the same element.
-
uint
iterator::
i
() const¶
-
uint
iterator::
j
() const¶
-
uint
iterator::
k
() const¶ Return array index of element pointed to by the iterator.
iterator::i()
is defined for all arrays.iterator::j()
is defined only for 2D and 3D arrays.iterator::k()
is defined only for 3D arrays.
1D Array Iterators¶
The following operators are defined only for 1D arrays:
-
iterator &
iterator::
operator--
()¶ Pre decrement. Return a reference to the decremented iterator.
-
iterator
iterator::
operator--
(int)¶ Post decrement. Return the value of the iterator before being decremented.
-
iterator
iterator::
operator+=
(difference_type d)¶
-
iterator
iterator::
operator-=
(difference_type d)¶ Increment (decrement) iterator d times. Return value of incremented (decremented) iterator.
-
iterator
iterator::
operator+
(difference_type d) const¶
-
iterator
iterator::
operator-
(difference_type d) const¶ Return a new iterator that has been incremented (decremented) by d.
-
difference_type
iterator::
operator-
(const iterator &it) const¶ Return difference between this iterator and it in number of elements. The iterators must refer to elements in the same array.
-
bool
iterator::
operator<=
(const iterator &it) const¶
-
bool
iterator::
operator>=
(const iterator &it) const¶
-
bool
iterator::
operator<
(const iterator &it) const¶
-
bool
iterator::
operator>
(const iterator &it) const¶ Return true if the two iterators satisfy the given relationship. For two iterators, p and q, within the same array, p < q if and only if q can be reached by incrementing p one or more times.