/HOMER

Helper Of My Eternal Retrievals

Primary LanguagePythonOtherNOASSERTION

                                HOMER
                  Helper Of My Eternal Retrievals
===============================================================================


Author :       Michael D. Himes    University of Central Florida
Contact:       mhimes@knights.ucf.edu

Advisor:       Joseph Harrington   University of Central Florida

Contributors:  David C. Wright
                 - containerized the code
               Zaccheus Scheffer
                 - assisted with refactoring the codebase that led to HOMER


Acknowledgements
----------------
This research was supported by the NASA Fellowship Activity under NASA Grant 
80NSSC20K0682.  We gratefully thank Nvidia Corporation for the Titan Xp GPU 
that was used throughout development of the software.


Summary
=======
HOMER is a Bayesian inverse modeling code.  Given some data and 
uncertainties, the posterior distribution is determined for some model. 

HOMER uses the Large-selection Interface for Sampling Algorithms (LISA) for 
its Bayesian framework.  LISA allows for both MCMC and nested sampling 
algorithms.  For details, see the LISA User Manual at 
https://exosports.github.io/LISA/doc/LISA_User_Manual.html.

HOMER's forward model is a neural network (NN) surrogate model trained by 
MARGE.  For details on MARGE, see the MARGE User Manual at 
https://exosports.github.io/MARGE/doc/MARGE_User_Manual.html.

HOMER comes with complete documentation as well as a user manual to assist 
in its usage.  Users can find the latest HOMER User Manual at 
https://exosports.github.io/HOMER/doc/HOMER_User_Manual.html.

HOMER is an open-source project that welcomes improvements from the community 
to be submitted via pull requests on Github.  To be accepted, such improvements 
must be generally consistent with the existing coding style, and all changes 
must be updated in associated documentation.

HOMER is released under the Reproducible Research Software License.  Users are free to 
use the software for personal reasons.  Users publishing results from HOMER 
(or modified versions of it) in a peer-reviewed journal are required to publicly
release all necessary code/data to reproduce the published work.  Modified 
versions of HOMER are required to also be released open source under the same 
Reproducible Research License.  For more details, see the text of the license.


Files & Directories
===================
HOMER contains various files and directories, described here.

doc/            - Contains documentation for HOMER.  The User Manual contains 
                  the information in the README with more detail, as well as a 
                  walkthrough for setup and running an example.
environment.yml - Contains all required packages (w/ versions) for HOMER.
example/        - Contains example of executing HOMER.
                  Includes example functions for each sampling algorithm.
                  See README within the directory for more details.
HOMER.py        - The executable driver for HOMER. Accepts a configuration file.
lib/            - Contains the classes and functions of HOMER.
  bestfit.py    - Contains functions to plot the best-fit spectrum.
  compost.py    - Contains functions to compare posterior distributions.
  func.py       - Contains example functions to be evaluated by an MCMC sampler.
  NN.py         - Contains a function to load a MARGE NN model.
  plotter.py    - Contains functions to plot results of the inference.
  utils.py      - Contains utiity functions.
Makefile        - Handles setting up BART, and creating a TLI file.
modules/        - Contains submodules for data generation, quantile calculation.
  datasketches  - Streaming quantiles package.
  LISA          - Bayesian sampler package.
README          - This file!


Note that all .py files have complete documentation; consult a specific file 
for more details.


Installation
============
After recursively cloning the repo, users must compile one of LISA's modules. 
From HOMER's directory,
    make lisa

If the user wishes to compute median and 1-2-3sigma spectra, they will require 
the datasketches package.  From the HOMER directory, enter
    make datasketches
into the terminal.  Note that this package is NOT required to run HOMER.

To install both packages easily, do
    make all


Executing HOMER
===============
HOMER is controlled via a configuration file.  After setting up the 
HOMER configuration file, run HOMER via
    ./HOMER.py <path/to/configuration file>

Note that HOMER requires a trained MARGE model.  For details, see the MARGE 
User Manual (http://planets.ucf.edu/bart-docs/MARGE_user_manual.pdf).


Setting Up a HOMER Configuration File
=====================================
A HOMER configuration file contains many options, which are detailed in this 
section.  For an example, see config.cfg in the example subdirectory, and the 
associated README with instructions on execution.


Directories
-----------
inputdir  : str.  Directory containing HOMER inputs.
outputdir : str.  Directory containing HOMER outputs.


Run Parameters
--------------
quantiles   : bool. Determines whether to compute quantiles.
                    If the Datasketches library is not installed, this setting 
                    has no effect.
onlyplot    : bool. Determines whether to skip the inference.
                    Reproduces plots \& calculations related to posterior.
plot_PT     : bool. Determines whether to compute the explored 
                    pressure--temperature profiles using the formulation of 
                    Line et al. (2013).  Presently, this requires the model 
                    parameters to be ordered as 
                    log(kappa), log(gamma1), log(gamma2), alpha, beta, 
                    followed by any other parameters.
credregion  : bool. Determines whether to calculate the 68, 95, and 99\% 
                    credible regions \& uncertainties.
compost     : bool. Determines whether to compare HOMER's posterior to 
                    another.
compfile    : str.  Path to posterior .npy file to compare with HOMER.
compname    : str.  Name of the other posterior for plot legends.
compsave    : str.  File name prefix for the saved comparison plots.
compshift   : floats. Shifts all values of a particular parameter by a 
                    set amount in the posterior to be compared, such as 
                    for factor-of-10 unit conversions in a log space.
                    Format: val1 val2 val3 val4 ...
                    E.g., compshift = 1 0 0 would increase all of the first 
                    parameter's values by 1 and leave the other parameters 
                    alone.
postshift   : floats. Same as `compshift`, but for HOMER's posterior.


Data Normalization Parameters
-----------------------------
ilog        : bool. Determines whether the NN takes the logarithm of the 
                    inputs.
                    Alternatively, specify comma-, space-, or newline-separated 
                    integers to selectively take the log of certain inputs.
olog        : bool. Determines whether the NN predicts the log of the 
                    outputs.
                    Alternatively, specify comma-, space-, or newline-separated 
                    integers to selectively take the log of certain outputs.
normalize   : bool. Determines whether to standardize the data by its 
                    mean and standard deviation.
scale       : bool. Determines whether to scale the data to be within a 
                    range.
scalelims   : ints. Range to scale the data to.
                    Format: low, high
fmean       : str.  Path to .NPY file of mean training input and output 
                    values.
                    Format: [inp0, inp1, ..., outp0, outp1, ...]
                    If relative path, assumed to be with respect to the 
                    input directory.
fstdev      : str.  Path to .NPY file of standard deviation of 
                    inputs/outputs.
                    See `fmean` for format \& path description.
fmin        : str.  Path to .NPY file of minima of inputs/outputs.
                    See `fmean` for format \& path description.
fmax        : str.  Path to .NPY file of maxima of inputs/outputs.
                    See `fmean` for format \& path description.


Neural Network (NN) Parameters
------------------------------
weight_file: str. File containing NN model & weights.
                  NOTE: MUST end in .h5
inD  : int. Dimensionality of the input  to the NN.
outD : int. Dimensionality of the output of the NN.


Bayesian Sampler Parameters
---------------------------
alg         : str.  Bayesian sampling algorithm.  Options:
                      demc                 ter Braak (2006)
                      snooker      ter Braak & Vrugt (2008)
                      multinest         Feroz et al. (2008)
                      ultranest              Buchner (2014, 2016, 2019)
func        : strs. Function and file to evaluate at each iteration of 
                    the MCMC.
                    Format: 
                      function file
                    or, if the file is in a different directory,
                      function file path/to/file/
                    Note: omit the '.py' from `file`.
pnames      : strs. Name of each free parameter. Can include LaTeX 
                    formatting.
pinit       : floats. Initial parameters for the MCMC.
pmin        : floats. Minima for free parameters.
pmax        : floats. Maxima for free parameters.
pstep       : floats. Step size for free parameters. 
                      This will change throughout the MCMC due to the 
                      differential evolution algorithm used.
data        : floats. Values to be fit via MCMC. 
                    Format: Separate each value by an indented newline.
                    Alternatively, specify a .NPY file.
uncert      : floats. Uncertainties on values to be fit via MCMC. 
                      Same format (indented newlines or .NPY)


Additional Sampler Parameters
-----------------------------
Only required if your setup requires the parameter.

filters     : strs. Paths to filter bandpasses associated with each datum. 
                    Separate each by indented newlines.
                    X-axis values must have the same type of units as `xvals`,
                    but may be separated by a constant multiplicative factor
                    (e.g., mm vs um, ft vs inch), see `filtconv` argument.
starspec    : str.  Path to .NPY file of the stellar spectrum.
                    More generally, this is an array of scaling factors that 
                    can be used when calculating the model.
factor      : str.  Path to .NPY file of scaling factor by which 
                    to modify de-normalized predictions. 
                    E.g., unit conversion.
wn          : bool. In astro context, determines whether the X-axis units are 
                    spatial frequency (True) or wavelength (False).
                    More generally, when wn = False, the X-axis values are 
                    transformed to their inverse when plotting and 
                    band-integrating models.  
wnfact      : float. Multiplication factor when computing the inverse of 
                     `xvals`.
                     E.g., if `xvals` is in cm-1 and the desired inverse units 
                     are um, then wnfact = 1e4.
filtconv    : float. Multiplication factor to convert the filter 
                     X-value units to the `xvals` units.
                     E.g., if filter X values are in nm, but `xvals` are in um, 
                     this argument would be set to 1e-3.


MCMC Parameters
---------------
Not required for nested sampling algorithms.

flog        : str.  Path to MCMC log file. 
                    If relative, with respect to input dir.
nchains     : int.  Number of parallel samplers.
niter       : int.  Number of total iterations.
burnin      : int.  Number of burned iterations from the beginning of 
                    chains.
thinning    : int.  Thinning factor for posterior.


Plotting Parameters
-------------------
xvals      : str.  Path to .NPY file containing the x-axis values 
                   associated with a prediction.
                   If relative, path is with respect to `inputdir`.
xlabel     : str.  X-axis label for plots.
ylabel     : str.  Y-axis label for plots.
fpress     : str.  Path to text file containing the pressures of each 
                   layer of the atmosphere, for plotting T(p) profiles.
                   If relative, with respect to `inputdir`.
PTargs     : str.  Path to .txt file containing values necessary to 
                   calculate the temperature--pressure profile.
                   Currently, only option is Line et al. (2013) method.
                   Format: R_star (m), T_star (K), T_int (K), 
                           sma (m), grav (cm s-2)
                   If plot_PT is False, this argument does nothing.
savefile   : str.  (optional) Prefix for MCMC plots to be saved.


Versions
========
HOMER was developed on a Unix/Linux machine using the following 
versions of packages:
 - Python 3.7.2
 - Keras 2.2.4
 - Numpy 1.16.2
 - Matplotlib 3.0.2
 - mpi4py 3.0.3
 - Scipy 1.2.1
 - sklearn 0.20.2
 - Tensorflow 1.13.1
 - CUDA 9.1.85
 - cuDNN 7.5.00
 - ONNX 1.6.0
 - keras2onnx 1.6.1
 - onnx2keras 0.0.18
 - pymultinest 2.10
 - ultranest 2.2.2

See the supplied environment.yml file.


Be kind
=======
Please cite this paper if you found this package useful for your research:

Himes et al. 2022, PSJ, 3, 91
https://iopscience.iop.org/article/10.3847/PSJ/abe3fd/meta

@ARTICLE{2022PSJ.....3...91H,
       author = {{Himes}, Michael D. and {Harrington}, Joseph and {Cobb}, Adam D. and {G{\"u}ne{\c{s}} Baydin}, At{\i}l{\i}m and {Soboczenski}, Frank and {O'Beirne}, Molly D. and {Zorzan}, Simone and {Wright}, David C. and {Scheffer}, Zacchaeus and {Domagal-Goldman}, Shawn D. and {Arney}, Giada N.},
        title = "{Accurate Machine-learning Atmospheric Retrieval via a Neural-network Surrogate Model for Radiative Transfer}",
      journal = {\psj},
     keywords = {Exoplanet atmospheres, Bayesian statistics, Posterior distribution, Convolutional neural networks, Neural networks, 487, 1900, 1926, 1938, 1933},
         year = 2022,
        month = apr,
       volume = {3},
       number = {4},
          eid = {91},
        pages = {91},
          doi = {10.3847/PSJ/abe3fd},
       adsurl = {https://ui.adsabs.harvard.edu/abs/2022PSJ.....3...91H},
      adsnote = {Provided by the SAO/NASA Astrophysics Data System}
}

Thanks!