(Russian лягушка [lʲɪˈɡuʂkə]: frog)
Lyagushka is a Rust command-line tool inspired by Fatum Project's 'Zhaba' algorithm (Russian 'жаба': toad) and expands upon it for more versatility.
It is an algorithm that analyzes a one-dimensional dataset of integers to identify clusters of closely grouped "attractor" points and significant "void" gaps between these clusters. It calculates z-scores for each cluster or gap to measure their statistical significance relative to the dataset's mean density and distance between points. The analysis results, including attractors, voids, and their z-scores, are output as a JSON string.
With a Rust and Cargo environment set up, simply run:
$ cargo build --release
To also compile a Python wheel, you need Maturin set up. SImply run:
$ maturin build --release
$ pip install target/wheels/lyagushka-1.1.0*.whl
filename.txt
(optional): A file containing a newline-separated list of integers to analyze. If not provided, the program expects input from stdin.factor
: A floating-point value by which the mean density/span is multiplied to make up a threshold for attractor and void detection.min_cluster_size
: An integer specifying the minimum number of contiguous points required to be considered a cluster.
The tool outputs a JSON string that includes details about the identified attractors and voids, along with their respective z-scores. Here's an example of the JSON output format:
[
//...
{
"elements": [ 722, 722, 722, 725, 725, 726, 726, 726],
"start": 722,
"end": 726,
"span_length": 4,
"num_elements": 8,
"centroid": 724.0,
"z_score": 1.19528
},
{
"elements": [],
"start": 732,
"end": 740,
"span_length": 8,
"num_elements": 0,
"centroid": 736.0,
"z_score": -1.13359
},
//...
]
To analyze a dataset from a file, provide the filename as an argument, followed by the factor and minimum cluster size parameters
lyagushka random_values.txt 1.5 6
(= 'Attractor clusters need to have at least 6 numbers with 1.5 times the mean density, void gaps need to be at leat 1.5 times the mean gap size wide')
Alternatively, you can pipe a list of integers into the tool, followed by the factor and minimum cluster size.
cat random_values.txt | lyagushka 0.5 2