/findingaleph

Research project involving biomathematics and an optimization problem.

Primary LanguageFortranMIT LicenseMIT

This file describes how to compile and run findingaleph to solve the problem
described in [article]. If you are interested only in reading the resulting
files, check readpfile.m.

0 - Files in this package
-------------------------

Included in this package are the following files and file structure:

- LICENSE file

- README.txt: this file

- cleanpfile.py: python script used to clean up the pefiles (when the
  optimization solver failed to solve a given problem, the result is written in
  the pefile as '**********', which is not read properly by MATLAB. We
  preprocessed these files using python to replace these characters for 'Inf'.
  (The original files are backed up)

- compile.sh: bash script used to compile the whole findingaleph program.

- findingaleph.f90: main program.

- pmodule.f90: module for reading/writing input and output files.

- readbacterial.m: MATLAB script for manipulating the output of the optimization
  solver on bacterial data.

- readpfile.m: MATLAB script for manipulating the output of the optimization
  solver on random data

- runall.sh: bash script used to run the optimization solver on a set of
  problems defined by random data (pefiles)

- tests.m: dummy MATLAB script where statistical tests can be executed for a
  batch of outputs (treats a chunk of one pefile at a time)
  
The input and results files can be found as follows:

* input: folder containing the following files:
    *- matrizes01.txt -- matrizes10.txt, each containing the random data
       generated by randprobvector.m (original code)
    *- pairs.mat, pairs.txt, containing the original bacterial data 
    
* results: folder containing the following files:
    *- pefile01.txt -- pefile10.txt, each containing the result of the
       optimization problem solved by findingaleph. These results are stored
       as lines of 20 numbers in f10.7 format delimited by whitespaces
       (more details in readpfile.m). Each pefile has 10.000.000 (10 million)
       lines.
    *- pefilepairs.txt, containing the results of the same optimization problem
       applied to the bacterial data read from pairs.txt (stored in the same
       format as the pefiles above)

1 - Requirements
----------------

This software was used on Ubuntu Linux 64-bit, with gcc/gfortran version 5.4.0,
and Galahad version 2.60004 (21/October/2015). 

To run it, you need gcc/gfortran >= 4.9.0.

1.1 - Galahad
-------------
Before compiling, it is necessary to install Galahad, a "thread-safe library of
Fortran 2003 packages for solving nonlinear optimization problems" [1]. Galahad
is available at

http://www.galahad.rl.ac.uk/

and it is free of charge for academic purposes (but the user must register
before downloading). Galahad comes with a few auxiliary packages, but we only
need to install ARCHDefs. SIFDecode and CUTEst are not needed to run the Galahad
solver on a problem written in Fortran (our case).

Before installing, check the README files for instructions. 

1.2 - HSL-Galahad
-----------------
A library of performance-improving HSL codes are also needed for our package.
This is available at

http://www.hsl.rl.ac.uk/download/hsl-galahad/2.60000.2/d/

and, again, one must register before downloading it. To install it, follow the
instructions provided by the Galahad documentation.

2 - Installation
----------------

After installing Galahad and the HSL libraries, you can use the bash script
compile.sh (you might have to give it execution permissions) to compile
findingaleph:

$ ./compile.sh

This includes compiling and linking the module pmodule.f90, responsible for
reading and writing the input and output files, and linking the correct
libraries for compilation, including BLAS, Lapack, Metis and HSL-Galahad.

3 - Usage
---------

Given some 4x4 real matrix P, we must find some real vector
x = (x(1), x(2), x(3), x(4)) and some real 4x4 matrix Epsilon such that x and
Epsilon solve the following optimization problem:

minimize sum(Epsilon(i,j)^2)
subject to:
	x(4) = 1;
	diag(x)*(P+Epsilon) is persymmetric.

In our code, x is a 20-element vector with Epsilon stored in x as follows:

x(4+j+(i-1)*4) = Epsilon(i,j), for all i=1,...,4, j=1,...,4

and such that Epsilon is stored by rows:

x(5)  x(6)  x(7)  x(8)
x(9)  x(10) x(11) x(12)
x(13) x(14) x(15) x(16)
x(17) x(18) x(19) x(20)

The constraints of the optimization problem above are implemented accordingly.

3.1 - Bacterial data
--------------------

If you want to run the optimization solver for the bacterial data, contained in
file pairs.txt, you must, after compiling, create an empty file called
pefilepairs.txt and execute the following command:

$ ./findingaleph pairs

There are 1428 problems. The result is written in file pefilepairs.txt, with
each line containing x(1)--x(4) in the first four entries, and Epsilon in the
last 16 entries, such that for each line in the file,

for i = 1:4
    for j = 1:4
    	Epsilon(i,j) = line(4+j+(i-1)*4)
    end
end

3.2 - Random data
-----------------

If you want to run the optimization solver for the random data, contained in
files matrizes01.txt through matrizes10.txt, you must, after compiling, create
an empty file called pefileXX.txt (where XX is the 01 -- 10) and execute

$ ./findingaleph pmatf

There are 100 million problems. Results are written in file pefileXX.txt, with
each line containing x(1)--x(4) in the first four entries, and Epsilon in the
last 16 entries, such that for each line in the file,

for i = 1:4
    for j = 1:4
    	Epsilon(i,j) = line(4+j+(i-1)*4)
    end
end

4 - Results
-----------

To read the results and produce any statistic or analysis, one can run
readpfile.m or readbacterial.m. This reads the relevant results files, and
performs whatever operations the user sets up in tests.m

Example:

>>> [meannormdif, meanerrors, outC, inC] = readpfile(pefile01.txt,
                                                     matrizes01.txt,
						     100000)

where outC and inC are cells produced by MATLAB's textscan function.

References
----------

[1] N. I. M. Gould, D. Orban, and Ph. L. Toint, "GALAHAD, a library of
thread-safe Fortran 90 packages for large-scale nonlinear optimization",
ACM Trans. Math. Software 29(4):353-372, 2004.
Download link: http://www.galahad.rl.ac.uk/doc/galahad.pdf
(included in the docs directory)

[2] Documentation for Lancelot-Simple.
Download link: http://www.galahad.rl.ac.uk/doc/lancelot_simple.pdf
(included in the docs directory)

--------------------------
Revised December 9, 2018.
Melissa Weber Mendonca
melissa.mendonca@ufsc.br