An encoder/decoder collection for a sequence of integers

Encoding lists of integers efficiently is important for many applications in different fields. Adjacency lists of large graphs are usually encoded to save space and to improve decoding speed. Inverted indexes of Information Retrieval systems keep the lists of postings compressed in order to exploit the memory hierarchy. Secondary indexes of DBMSs are stored similarly to inverted indexes in IR systems.

We have collected a set of state-of-the-art integer encoders into an extensible C++ library with a particular focus on efficiency.

The library currently implements the following encoders:

  • Gamma
  • Delta
  • Variable Byte
  • Binary Interpolative
  • Simple 9
  • Simple 16
  • PForDelta
  • OPTPForDelta
  • VSEncoding (including lots variants)

The library is meant to be extensible and, indeed, we invite contributions to be sent in order to be included in the next releases of the library.

Current performance measurements

In addition to providing a set of freely available implementation of encoders for integers and list of integers, in general, another goal we have is to provide a framework within which people can compare their own solutions against the state-of-the-art ones. For this reason we provide some datasets (available in the download page) for comparing different implementations.

We have conducted some experiments on a Linux machine equipped with an Intel Core i5-2500 (3.3GHz) processor and 8GB of RAM. The dataset we have used is the list of postings for an index built on .gov2, one of the document collections available in the TREC’s Terabyte Track.

The following table reports results of the different methods currently implemented in terms of Million Integers decoded per Second (mis) and in terms of bit-per-integer (bpi).



\begin{tabular}{l|r|r} % Column formatting, @{} suppresses leading/trailing space

\textbf{Encoder} & \textbf{mis} & \textbf{bpi}\\
\hline \hline
{\sc Gamma} & $144.82$ & $5.05$ \\
{\sc Delta} & $140.74$ & $4.70$ \\
{\sc Variable Byte} & $226.15$ & $8.97$ \\
{\sc Binary Interpolative} & $103.27$ & $3.95$ \\
{\sc Simple 9} & $571.93$ & $5.41$ \\
{\sc Simple 16} & $575.90$ & $5.17$ \\
{\sc PForDelta} & $990.22$ & $6.16$ \\
{\sc OPTPForDelta} & $842.96$ & $5.36$ \\
{\sc VSEncodingBlocks} & $915.36$ & $4.95$ \\
{\sc VSE-R} & $167.21$ & $4.55$ \\
{\sc VSEncodingSimple v1} & $1,701.11$ & $6.40$ \\
{\sc VSEncodingSimple v2} & $1,812.66$ & $6.79$ \\