/mopet

The mildly ominous parameter exploration toolkit 🛵 - Run huge simulations on distributed clusters, because why not? 🤯

Primary LanguagePythonMIT LicenseMIT

Build Python 3.6 + 3.7 Release PyPI codecov Code style: black

mopet 🛵

The mildly ominous parameter exploration toolkit

Isn't it strange that, although parameter explorations are a crucial part of computational modeling, there are almost no Python tools available for making your life easier? mopet is here to help! You can run extensive grid searches in parallel (powered by ray) and store extremely huge amounts of data into a HDF file (powered by pytables) for later analysis - or whatever your excuse is for buying yet another hard disk.

Installation 💻

The easiest way to get going is to install the pypi package using pip:

pip install mopet

Alternatively, you can also clone this repository and install all dependencies with

git clone https://github.com/caglorithm/mopet.git
cd mopet/
pip install -r requirements.txt
pip install .

Example usage 🐝

Setting up an exploration is as easy as can be!

# first we define an toy evaluation function
def distance_from_circle(params):
	# let's simply calculate the distance of 
	# the x-y parameters to the unit circle
    distance = abs((params["x"] ** 2 + params["y"] ** 2) - 1)
    
    # we package the result in a dictionary
    result = {"result" : distance}
    return result

Let's set up the exploration by defining the parameters to explore and passing the evaluation function from above:

import numpy as np
import mopet

explore_params = {"x": np.linspace(-2, 2, 21), "y": np.linspace(-2, 2, 21)}
ex = mopet.Exploration(distance_from_circle, explore_params)

Running the exploration is in parallel and is handled by ray. You can also use a private cluster or cloud infrastructure, see here for more info.

ex.run()
>> 100%|██████████| 441/441 [426.57it/s]
ex.load_results()

An overview of the runs and runs is given as a pandas DataFrame, available as ex.df. Here we load the result, which is simply a float, directly into the DataFrame. However, if the result was a timeseries (a numpy.ndarray), we could process it at this stage and extract some scalar value, for example the amplitude of the data or the dominant frequency. Using some fancy pivoting, we can create a 2D matrix with the results as entries

ex.df["result"] = None
for r in ex.df.index:
    ex.df.loc[r, "result"] = ex.results[r]['result']
    
pivoted = ex.df.pivot_table(values='result', index = 'y', columns='x', aggfunc='first')

Let's plot the results!

import matplotlib.pyplot as plt
# a nice color map
plt.imshow(pivoted, \
           extent = [min(ex.df.x), max(ex.df.x),
                     min(ex.df.y), max(ex.df.y)], origin='lower')
plt.colorbar(label='Distance from unit circle')
plt.xlabel("x")
plt.ylabel("y")

More information 📓

Inspired by 🤔

mopet is inspired by pypet, a wonderful python parameter exploration toolkit. I have been using pypet for a very long time and I'm greatful for its existence! Unfortunately, the project is not maintained anymore and has run into several compatibility issues, which was the primary reason why I built mopet.

Built With 💞

mopet is built on other amazing open source projects:

  • ray - A fast and simple framework for building and running distributed applications.
  • pytables - A Python package to manage extremely large amounts of data.
  • tqdm - A Fast, Extensible Progress Bar for Python and CLI
  • pandas - Flexible and powerful data analysis / manipulation library for Python
  • numpy - The fundamental package for scientific computing with Python