/PGM-index

🏅State-of-the-art learned data structure that enables fast lookup, predecessor, range searches and updates in arrays of billions of items using orders of magnitude less space than traditional indexes

Primary LanguageC++GNU General Public License v3.0GPL-3.0

The PGM-index

The Piecewise Geometric Model index (PGM-index) is a data structure that enables fast lookup, predecessor, range searches and updates in arrays of billions of items using orders of magnitude less space than traditional indexes while providing the same worst-case query time guarantees.

Website | Documentation | Paper | Python wrapper | A³ Lab

Travis (.org) License GitHub stars GitHub forks

Building the code

To download and build the library use the following commands:

git clone https://github.com/gvinciguerra/PGM-index.git
cd PGM-index
cmake . -DCMAKE_BUILD_TYPE=Release
make -j8

Now you can run the unit tests via:

./test/tests

Minimal example

#include <vector>
#include <cstdlib>
#include <iostream>
#include <algorithm>
#include "pgm_index.hpp"

int main(int argc, char **argv) {
    // Generate some random data
    std::vector<int> dataset(1000000);
    std::generate(dataset.begin(), dataset.end(), std::rand);
    dataset.push_back(42);
    std::sort(dataset.begin(), dataset.end());

    // Construct the PGM-index
    const int epsilon = 128; // space-time trade-off parameter
    PGMIndex<int, epsilon> index(dataset);

    // Query the PGM-index
    auto q = 42;
    auto approx_range = index.find_approximate_position(q);
    auto lo = dataset.begin() + approx_range.lo;
    auto hi = dataset.begin() + approx_range.hi;
    std::cout << *std::lower_bound(lo, hi, q);

    return 0;
}

License

This project is licensed under the terms of the GNU General Public License v3.0.

If you use the library please put a link to the website and cite the following paper:

Paolo Ferragina and Giorgio Vinciguerra. The PGM-index: a fully-dynamic compressed learned index with provable worst-case bounds. PVLDB, 13(8): 1162-1175, 2020.

@article{Ferragina:2020pgm,
  Author = {Paolo Ferragina and Giorgio Vinciguerra},
  Title = {The {PGM-index}: a fully-dynamic compressed learned index with provable worst-case bounds},
  Year = {2020},
  Volume = {13},
  Number = {8},
  Pages = {1162--1175},
  Doi = {10.14778/3389133.3389135},
  Url = {https://pgm.di.unipi.it},
  Issn = {2150-8097},
  Journal = {{PVLDB}}}