Class CompressedRecordArray
java.lang.Object
com.ibm.j9ddr.corereaders.tdump.zebedee.util.CompressedRecordArray
- All Implemented Interfaces:
Serializable
This class represents an array of records which are stored in a compressed format whilst
still allowing random access to them. Each record in turn is simply an array of ints. Each
record must be the same length. To implement this we divide the array of records up into
blocks. There is an index and a bit stream. The index gives the start of each block in the
bit stream. Each block contains a set of records stored in an encoded format. A header at
the beginning defines the encoding used. The encoding is chosen dynamically to give the
best compression. Deltas (ie the differences between values in adjacent records) are stored
rather than the values themselves which gives good results for certain types of data.
The number of records per block is
configurable and there is a space/time trade-off to be made because a large number of
records per block will give better compression at the cost of more time to extract each
record (because you have to start at the beginning of the block and then uncompress each
record in turn until you reach the one you want).
I wrote a test to measure the performance on some real life data (in fact this data is the reason I wrote this class in the first place). The data consists of a file containing z/OS fpos_t objects obtained by calling fgetpos sequentially for every block (4060 bytes) in an svcdump. Each fpos_t object is actually an array of 8 ints containing obscure info about the disk geometry or something, but the important thing is that it changes in a reasonably regular fashion and so is a good candidate for compression via deltas. The original file had a length of 3401088. Here are the results which suggest that a block size of 32 (log2 of 5) is a good choice (the time is that taken to write the data and then read it back again to check):
| log2 | block size | memory usage | time (ms) |
|---|---|---|---|
| 0 | 1 | 4191388 | 782 |
| 1 | 2 | 2706992 | 691 |
| 2 | 4 | 1217920 | 621 |
| 3 | 8 | 790472 | 620 |
| 4 | 16 | 516772 | 721 |
| 5 | 32 | 340448 | 942 |
| 6 | 64 | 334304 | 1362 |
| 7 | 128 | 334304 | 2223 |
| 8 | 256 | 340448 | 3966 |
| 9 | 512 | 355808 | 7470 |
- See Also:
-
Constructor Summary
ConstructorsConstructorDescriptionCompressedRecordArray(int blockSizeLog2, int recordSize) Create a new CompressedRecordArray. -
Method Summary
Modifier and TypeMethodDescriptionvoidadd(int[] record) Add a new record.voidclose()Close this CompressedRecordArray.voidget(int recordNumber, int[] record) Get the given record number.static voidThis method is provided to test the CompressedRecordArray.intGive a rough estimate of how many bytes of storage we use.
-
Constructor Details
-
CompressedRecordArray
public CompressedRecordArray(int blockSizeLog2, int recordSize) Create a new CompressedRecordArray. A size of 5 for blockSizeLog2 gives good results.- Parameters:
blockSizeLog2- the number of records in each block expressed as a power of 2recordSize- the number of ints in each record
-
-
Method Details
-
add
public void add(int[] record) Add a new record. Data is copied from the given array.- Parameters:
record- an array of ints which forms the record to be added
-
close
public void close()Close this CompressedRecordArray. This must be called before any reading is done and no more records may be added afterwards. -
get
public void get(int recordNumber, int[] record) Get the given record number. To save on GC overhead the user supplies the int array to copy the record into.- Parameters:
recordNumber- the sequential number of the record to readrecord- the array to copy the record into
-
memoryUsage
public int memoryUsage()Give a rough estimate of how many bytes of storage we use. This is the actual storage allocated so may be more that what is in use at any one time. -
main
-