This repository provides a Julia library that implements the Significant Pattern Association (Spass) algorithm. By leveraging a binomial redundancy test for a sequentially-updating maximum entropy null-model, Spass provides an efficient method for discovering concise sets of statistically significantly non-redundant higher-order feature interactions (i.e., patterns). To highlight commonalities and differences between groups, Spass statistically associates each pattern with a subset of groups.
The code is a from-scratch implementation of algorithms described in the paper.
Sebastian Dalleiger and Jilles Vreeken. 2022.
Discovering Significant Patterns under Sequential False Discovery Control.
(KDD '22), pp. 263–272. https://doi.org/10.1145/3534678.3539398
Please consider citing the paper.
Contributions are welcome.
To install the library from the REPL:
julia> using Pkg; Pkg.add(url="https://github.com/sdall/spass.git")
To install the library from the command line:
julia -e 'using Pkg; Pkg.add(url="https://github.com/sdall/spass.git")'
To set up the command line interface (CLI) located in bin/spass.jl
:
- Clone the repository:
git clone https://github.com/sdall/spass
- Install the required dependencies including the library:
julia -e 'using Pkg; Pkg.add(path="./spass"); Pkg.add.(["Comonicon", "CSV", "GZip", "JSON"])'
A typical usage of the library is:
julia> using Spass: spass, FDR, FWER, patterns
julia> p = spass(FWER, X; alpha = 0.01)
julia> patterns(p)
For more information, please see the documentation:
help?> spass
A typical usage of the command line interface is:
chmod +x bin/spass.jl
bin/spass.jl dataset.dat.gz dataset.labels.gz --alpha=0.01 --fdr > output.json
The output contains patterns
and executiontime
in seconds (cf. --measure-time
for details).
For more information regarding usage, additional options, or input format, please see the provided documentation:
bin/spass.jl --help