libEnsemble is a Python library to coordinate the concurrent evaluation of dynamic ensembles of calculations. The library is developed to use massively parallel resources to accelerate the solution of design, decision, and inference problems and to expand the class of problems that can benefit from increased concurrency levels.
libEnsemble aims for the following:
- Extreme scaling
- Resilience/fault tolerance
- Monitoring/killing of tasks (and recovering resources)
- Portability and flexibility
- Exploitation of persistent data/control flow
The user selects or supplies a function that generates simulation input as well as a function that performs and monitors the simulations. For example, the generation function may contain an optimization routine to generate new simulation parameters on the fly based on the results of previous simulations. Examples and templates of such functions are included in the library.
libEnsemble employs a manager/worker scheme that can run on various communication media (including MPI, multiprocessing, and TCP); interfacing with user-provided executables is also supported. Each worker can control and monitor any level of work, from small subnode tasks to huge many-node simulations. An executor interface is provided to ensure that scripts are portable, resilient, and flexible; it also enables automatic detection of the nodes and cores in a system and can split up tasks automatically if resource data isn't supplied.
Required dependencies:
For libEnsemble running with the mpi4py parallelism:
- A functional MPI 1.x/2.x/3.x implementation, such as MPICH, built with shared/dynamic libraries
- mpi4py v2.0.0 or above
Optional dependency:
From v0.2.0, libEnsemble has the option of using the Balsam job manager. Balsam is required in order to run libEnsemble on the compute nodes of some supercomputing platforms that do not support launching tasks from compute nodes. As of v0.5.0, libEnsemble can also be run on launch nodes using multiprocessing.
The example simulation and generation functions and tests require the following:
- SciPy
- petsc4py
- DFO-LS
- Tasmanian
- NLopt
- PETSc - Can optionally be installed by pip along with petsc4py
PETSc and NLopt must be built with shared libraries enabled and present in
sys.path
(e.g., via setting the PYTHONPATH
environment variable). NLopt
should produce a file nlopt.py
if Python is found on the system. See the
NLopt documentation for information about building NLopt with shared
libraries. NLopt may also require SWIG to be installed on certain systems.
libEnsemble can be installed or accessed from a variety of sources.
Install libEnsemble and its dependencies from PyPI using pip:
pip install libensemble
Install libEnsemble with Conda from the conda-forge channel:
conda config --add channels conda-forge conda install -c conda-forge libensemble
Install libEnsemble using the Spack distribution:
spack install py-libensemble
libEnsemble is included in the xSDK Extreme-scale Scientific Software Development Kit from xSDK version 0.5.0 onward. Install the xSDK and load the environment with
spack install xsdk spack load -r xsdk
The codebase, tests and examples can be accessed in the GitHub repository. If necessary, you may install all optional dependencies (listed above) at once with
pip install libensemble[extras]
A tarball of the most recent release is also available.
The provided test suite includes both unit and regression tests and is run regularly on
The test suite requires the mock, pytest, pytest-cov, and pytest-timeout
packages to be installed and can be run from the libensemble/tests
directory
of the source distribution by running
./run-tests.sh
Further options are available. To see a complete list of options, run
./run-tests.sh -h
If you have the source distribution, you can download (but not install) the testing prerequisites and run the tests with
python setup.py test
in the top-level directory containing the setup script.
Coverage reports are produced separately for unit tests and regression tests
under the relevant directories. For parallel tests, the union of all processors
is taken. Furthermore, a combined coverage report is created at the top level,
which can be viewed at libensemble/tests/cov_merge/index.html
after run_tests.sh
is completed. The Travis CI coverage results are
available online at Coveralls.
Note
The executor tests can be run by using the direct-launch or
Balsam executors. Balsam integration with libEnsemble is now tested
via test_balsam_hworld.py
.
The examples directory contains example libEnsemble calling scripts, simulation functions, generation functions, allocation functions, and libEnsemble submission scripts.
The default manager/worker communications mode is MPI. The user script is launched as
mpiexec -np N python myscript.py
where N
is the number of processors. This will launch one manager and
N-1
workers.
If running in local mode, which uses Python's multiprocessing module, the
local
comms option and the number of workers must be specified. The script
can then be run as a regular Python script:
python myscript.py
These options may be specified via the command line by using the parse_args()
convenience function within libEnsemble's tools
module.
See the user guide for more information.
Support:
- The best way to receive support is to email questions to
libEnsemble@lists.mcs.anl.gov
. - Communicate (and establish a private channel, if desired) at the libEnsemble Slack page.
- Join the libEnsemble mailing list for updates about new releases.
Further Information:
- Documentation is provided by ReadtheDocs.
- A visual overview of libEnsemble is given in this poster.
Citation:
- Please use the following to cite libEnsemble in a publication:
@techreport{libEnsemble,
author = {Stephen Hudson and Jeffrey Larson and Stefan M. Wild and
David Bindel and John-Luke Navarro},
title = {{libEnsemble} Users Manual},
institution = {Argonne National Laboratory},
number = {Revision 0.7.1},
year = {2020},
url = {https://buildmedia.readthedocs.org/media/pdf/libensemble/latest/libensemble.pdf}
}