Short for Sympatry Graphs with Fuzzy Logic.
This is software to automatically construct a sympatry network, which is a digraph based on data points from observations of ecological distributions to calculate overlaps using fuzzy logic.
See paper (upcoming!) for more details.
In short, for each species we produce a large matrix of numbers between 0 and 1, which denotes the observed area of influence of said species. Each observation strengthens the entries of the matrix: each entry goes from 0.0 up to 1.0, depending on how close the observed data are to the points.
Then, for each pair of species, we calculate the area of overlap and produce a (directed) edge from species A to species B whose weight is the weight of the overlap area over the area of A (i.e. how much of A is inside B). This is done in a highly efficient manner.
See wikipedia's entry on fuzzy logic for a more in-depth explanation.
There are many models for fuzzy logic, depending on what one means when talking about the intersection. If property A is 70% true and property B is 50% true, how true is property (A and B)? Consider the following two: * (A and B) is 35% true (product). This mimics independent events in probability. * (A and B) is 50% true (minimum). Also called the Zadeh operators.
From either of those (and many more) we could create a model for fuzzy logic.
The standard way of defining the distribution area of a species is with propincuity, which is also available in this program (for comparison only. It's slow, and there are probably faster alternatives out there).
In propincuity, one takes a minimum spanning tree of the vertices, sets the radius to the average edge length, and then considers the area of distribution by moving a circle with said radius along all the edges of the spanning tree, like the following picture (radius adjusted for clarity).
First, you need to make sure you have the following dependencies:
g++ (or clang++)
eigen3
boost
cmake
libpng
png++
Make sure you have a recent version of each. In particular, your C++
compiler should have C++14
support (gcc >= 5.2 and clang >= 3.9 should be fine), a recent version of eigen3 (at least 3.2) and a recent version of boost (at least 1.59). Let us know if you encounter any issues.
There is some evidence that this program performs much better with clang than GCC. It also performs better with clang 5.0 than 4.0.
We are in the process of making all this available as a web service. If you are interested, let us know. We'll work faster and we'll be more motivated if we see there is some interest.
For example, to install in debian/ubuntu, do sudo apt-get install build-essentials libboost-all-dev cmake git eigen3 libpng
In a terminal, type:
sudo pacman -S cmake boost git eigen3 png++
After the dependencies are satisfied, just do the usual cmake dance:
cd /path/to/wherever
git clone https://github.com/mraggi/FuzzyLogicEcology.git
cd FuzzyLogicEcology
mkdir build
cd build
cmake ..
make
Here is a dummy example text file that the program reads (for a real example, see lobatae.txt or QuercusOaxaca.txt or rojos.txt)
Species SomeInfo Latitude Longitude MoreInfo
rubra 99 50.3 58.8 blah
palustris 99 50.3 58.8 blah
Suppose the above file is saved as myspecies.txt
. The columns SomeInfo
and MoreInfo
will be ignored. Columns are separated by tabs, spaces, or a combination of the two. Columns cannot contain spaces, so for example, if you have a column called Name
and then in that column an entry is Abies balsamea
, this will be detected as two different columns (and header row should be adjusted accordingly, for example, by adding Family
before Name
).
Then run the program as follows:
./fuzzylogic -I0 --grid=2500 --memory=2GB myspecies.txt
The I0 means that the name with which the species will be identified is in column 0. If there is more than one column with the name, just pass all the columns as arguments (i.e. -I0 -I2). Note that the columns start at 0, not 1.
To build the Quercus Oaxaca Graph using default options (i.e. for an example), see:
./quercus_build_graph.sh
Of course, more detailed options can be found by typing
./fuzzylogic --help
Basic options:
-h [ --help ] produce help message
-I [ --name-columns ] arg Name column indexes (starting at 0).
-x [ --latitude ] arg Latitude (or x) column index. Default: deduce
from input.
-y [ --longitude ] arg Latitude (or y) column index. Default: deduce
from input.
-g [ --grid ] arg (=2500) Grid size. A larger grid means more accurate the
calculations (but slower).
-m [ --memory ] arg Maximum amount of memory (in bytes) to use. Leave
blank or at 0 to use all available memory (not
recommended!). Can use KB, MB, GB.
-i [ --input-file ] arg Input file. If not given, the program will read
from STDIN
-v [ --influence ] arg (=1) Influence radius (in km). When doing the
exponential decay model, C=1/(2*v^2) (in
e^(-Cx^2))
-s [ --save-image ] arg In order to create an image, specify the (exact)
name of the species.
-f [ --fuzzy-min ] Use minimum instead of product as the fuzzy logic
model of intersection (warning: SLOW)
-p [ --propincuity ] Use Propincuity to calculate areas instead of
exponential decay (warning: SLOW. Not
recommended!)
-o [ --output-file ] arg sagemath output file name.
--matrix-file arg matrix output file name.
One way would be to use not only geographical information, but to apply a PCA reduction to your environmental variables in order to produce output of dimension 3. See the example notebook PCA.ipynb (requires scikit-learn) to convert your data before feeding it to the software.
This work was done by a group of researchers at Escuela Nacional de Estudios Superiores unidad Morelia, Universidad Nacional Autónoma de México with a research grant PAPIIT IA106316. For the complete list of authors, see the (upcoming) paper. All provided data was provided by César Andrés Torres Miranda.
If you find it useful, or if you have any comments, suggestions, feature requests, etc. feel free to email us at mraggi@gmail.com.