Copyright 2013, 2014 IST Austria
Ulrich Bauer, Michael Kerber, Jan Reininghaus
Hubert Wagner, Primoz Skraba
- [https://drive.google.com/uc?id=0B7Yz6TPEpiGEMGFNQ3FPX3ltelk&export=download PHAT, v1.3.0]
- [https://drive.google.com/uc?id=0B7Yz6TPEpiGENE9KUnhUSFdFQUk&export=download PHAT, v1.2.1]
- [https://drive.google.com/uc?id=0B7Yz6TPEpiGERGZFbjlXaUt1ZWM&export=download benchmark data]
This software library contains methods for computing the persistence pairs of a
filtered cell complex represented by an ordered boundary matrix with Z2 coefficients.
For an introduction to persistent homology, see the textbook [1]
. This software package
contains code for several algorithmic variants:
- The "standard" algorithm (see
[1]
, p.153) - The "row" algorithm from
[2]
(called pHrow in there) - The "twist" algorithm, as described in
[3]
(default algorithm) - The "chunk" algorithm presented in
[4]
The last two algorithms exploit the special structure of the boundary matrix to take shortcuts in the computation. The chunk algorithm makes use of multiple CPU cores if it is compiled with OpenMP support.
All algorithms are implemented as functor classes that manipulate a given boundary_matrix
(to be defined below) object to reduced form.
From this reduced form one can then easily extract the persistence pairs.
Alternatively, one can use the compute_persistence_pairs function
which takes an algorithm as a template parameter, reduces the given boundary_matrix
and stores the resulting pairs in a given persistence_pairs
object.
The boundary_matrix
class takes a "Representation" class as template parameter. This representation defines
how columns of the matrix are represented and how low-level operations
(e.g., column additions) are performed. The right choice of the representation
class can be as important for the performance of the program as choosing the
algorithm. We provide the following choices of representation classes:
vector_vector
: Each column is represented as a sortedstd::vector
of integers, containing the indices of the non-zero entries of the column. The matrix itself is astd::vector
of such columns.vector_heap
: Each column is represented as a heapifiedstd::vector
of integers, containing the indices of the non-zero entries of the column. The matrix itself is astd::vector
of such columns.vector_set
: Each column is astd::set
of integers, with the same meaning as above. The matrix is stored as astd::vector
of such columns.vector_list
: Each column is a sortedstd::list
of integers, with the same meaning as above. The matrix is stored as astd::vector
of such columns.sparse_pivot_column
: The matrix is stored as in the vector_vector representation. However, when a column is manipulated, it is first converted into astd::set
, using an extra data field called the "pivot column". When another column is manipulated later, the pivot column is converted back to thestd::vector
representation. This can lead to significant speed improvements when many columns are added to a given pivot column consecutively. In a multicore setup, there is one pivot column per thread.heap_pivot_column
: The same idea as in the sparse version. Instead of astd::set
, the pivot column is represented by astd::priority_queue
.full_pivot_column
: The same idea as in the sparse version. However, instead of astd::set
, the pivot column is expanded into a bit vector of size n (the dimension of the matrix). To avoid costly initializations, the class remembers which entries have been manipulated for a pivot column and updates only those entries when another column becomes the pivot.bit_tree_pivot_column
(default representation): Similar to thefull_pivot_column
but the implementation is more efficient. Internally it is a bit-set with fast iteration over nonzero elements, and fast access to the maximal element.
There are two ways to interface with the library:
-
using files: write the boundary matrix / filtration into a file "input" (see below for the file format). compile
src/phat.cpp
and run it:phat [--ascii] input output
read the resulting persistence pairs into your program
-
using the C++ library interface: include all headers found in
src/phat.cpp
define a boundary matrix object, e.g.phat::boundary_matrix< bit_tree_pivot_column > boundary_matrix;
set the number of columns:boundary_matrix.set_num_cols(...);
initialize each column using
boundary_matrix.set_col(...);
boundary_matrix.set_dim(...);
Define an object to hold the result:
phat::persistence_pairs pairs;
Run an algorithm like this:
phat::compute_persistence_pairs< phat::twist_reduction >( pairs, boundary_matrix );
Examine the result:
pairs.get_num_pairs()
pairs.get_pair(...)
A simple example that demonstrates this functionality can be found in src/simple_example.cpp
; another example, demonstrating how to compute persistent relative homology, can be found in src/relative_example.cpp
.
The library supports input and output in ascii and binary format
through the methods [load|save]_[ascii|binary]
in the classes boundary_matrix
and persistence_pairs
. The file formats are defined as follows:
-
boundary_matrix
- ascii: The file represents the filtration of the cell complex, containing one cell per line (empty lines and lines starting with "#" are ignored). A cell is given by a sequence of integers, separated by spaces, where the first integer denotes the dimension of the cell, and all following integers give the indices of the cells that form its boundary (the index of a cell is its position in the filtration, starting with 0). A sample filesingle_triangle.dat
can be found in the examples folder. -
boundary_matrix
- binary: In binary format, the file is simply interpreted as a sequence of 64 bit signed integer numbers. The first number is interpreted as the number of cells of the complex. The descriptions of the cells is expected to follow, with the first number representing the dimension of the cell, the next number, say N, representing the size of the boundary, followed by N numbers denoting the indices of the boundary cells. A sample filesingle_triangle.bin
can be found in the examples folder. -
persistence_pairs
- ascii: The file contains the persistence pairs, sorted by birth index. The first integer in the file is equal to the number of pairs. It is followed by pairs of integers encode the respective birth and death indices. A sample filesingle_triangle_persistence_pairs.dat
can be found in the examples folder. -
persistence_pairs
- binary: Same as ascii format, see above. Only now the integers are encoded as 64bit signed integers. A sample filesingle_triangle_persistence_pairs.bin
can be found in the examples folder.
- Visual Studio 2008 and 2012 (2010 untested)
- GCC version 4.4. and higher
- H.Edelsbrunner, J.Harer: Computational Topology, An Introduction. American Mathematical Society, 2010, ISBN 0-8218-4925-5
- V.de Silva, D.Morozov, M.Vejdemo-Johansson: Dualities in persistent (co)homology. Inverse Problems 27, 2011
- C.Chen, M.Kerber: Persistent Homology Computation With a Twist. 27th European Workshop on Computational Geometry, 2011.
- U.Bauer, M.Kerber, J.Reininghaus: Clear and Compress: Computing Persistent Homology in Chunks. [http://arxiv.org/pdf/1303.0477.pdf arXiv:1303.0477]