This repository provides a Julia library that implements a relaxed maximum entropy distribution (RelEnt) and the PAC-based RelEnt accelerated pattern discovery algorithm (Reap). It estimates a relaxed maximum entropy distribution by discovering sets of higher-order feature interactions (i.e., patterns) in Boolean data. For multiple groups in the data, Reap highlights differences and commonalities between the groups by leveraging associations between patterns and subsets of groups.
The code is a from-scratch implementation of algorithms described in the paper.
Sebastian Dalleiger and Jilles Vreeken. 2020.
The Relaxed Maximum Entropy Distribution and its Application to Pattern Discovery.
(ICDM '20), pp. 978–983, https://doi.org/10.1109/ICDM50108.2020.00112
Please consider citing the paper.
Contributions are welcome.
To install the library from the REPL:
julia> using Pkg; Pkg.add(url="https://github.com/sdall/reap.git")
To install the library from the command line:
julia -e 'using Pkg; Pkg.add(url="https://github.com/sdall/reap.git")'
To set up the command line interface (CLI) located in bin/reap.jl
:
- Clone the repository:
git clone https://github.com/sdall/reap
- Install the required dependencies including the library:
julia -e 'using Pkg; Pkg.add(path="./reap"); Pkg.add.(["Comonicon", "CSV", "GZip", "JSON"])'
For example, to fit a relaxed maximum entropy pattern distribution from a given pattern set:
julia> using Reap: reap_estimate, reap, patterns
julia> p = reap_estimate(X, patternset; max_factor_size=5)
For example, to discover a relaxed maximum entropy pattern distribution from a given dataset:
julia> p = reap(X)
julia> patterns(p)
To see the full list of options:
help?> reap_estimate
help?> reap
A typical usage of the command line interface is:
chmod +x bin/reap.jl
bin/reap.jl dataset.dat.gz dataset.labels.gz > output.json
The output contains patterns
and executiontime
in seconds (cf. --measure-time
for details).
For further information regarding usage, available options, or input format, please see the documentation:
bin/reap.jl --help