This repository contains information about running MatFlow on the Computational Shared Facility (CSF) at the University of Manchester.
Included are:
- A software definition file
- A set of example task schemas
- Some example workflows
- Some Jupyter notebooks demonstrating use of the MatFlow API on completed workflows. Click the Binder link above and navigate to
/workflows/jupyter_notebooks
to explore these.
-
Add
export HDF5_USE_FILE_LOCKING=FALSE
to your.bash_profile
. This is to allow MatFlow to work on the scratch filesystem. See this issue. -
To allow access to the internet so we can install MatFlow, first load the proxy module:
module load tools/env/proxy2
. We only need to do this once, when installing Python packages from the web. However, if you want to use MatFlow's cloud archiving facility (i.e. copying your workflow results to Dropbox), you will need to make sure the proxy module is always loaded. You can do this by adding the module load line to a file.modules
in your home directory. -
Load Anaconda to give us access to
pip
:module load apps/binapps/anaconda3/2019.07
-
Now install MatFlow and some extensions, using
pip
. This may take several minutes. You may receive a warning about the scripts path not being on your PATH (see next step).pip install --user matflow matflow-damask matflow-formable matflow-mtex matflow-defdap
-
Make sure the following path is on your
$PATH
environment variable:~/.local/bin
. This can be done in your.bash_profile
file like this:PATH=$PATH:~/.local/bin
. -
Run
matflow validate
to check the installation (you may get a warning about the MTEX extension - this is fine) -
Add the
software.yml
andtask_schemas.yml
files from this repository to your MatFlow software sources and task schemas sources respectively. These files are already in thejf01
group shared RDS space under the path/mnt/eps01-rds/jf01-home01/shared/matflow
. To register them with MatFlow, edit the MatFlowconfig.yml
file, which, after runningmatflow validate
for the first time, resides here:~/.matflow/config.yml
(i.e. in your home directory). Add the following path to thetask_schema_sources
list in the config file:/mnt/eps01-rds/jf01-home01/shared/matflow/task_schemas.yml
...and add the following path to the
software_sources
list in the config file:/mnt/eps01-rds/jf01-home01/shared/matflow/software.yml
-
Now run
matflow validate
again. This time there should be no warnings.
Note: when connecting to the CSF to submit workflows, do not use X11 forwarding (the -X
flag of the ssh
command).
Often, preparation and processing jobs are not computationally expensive, and can be run as serial jobs in the short queue on the CSF. We can set default scheduler options for the preparation and processing jobs by adding this to the MatFlow config file:
default_preparation_run_options:
l: short
default_processing_run_options:
l: short
default_iterate_run_options:
l: short
In this case, all preparation and processing jobs will use the short queue by default. This can be overidden from within a workflow if necessary.
Run the command matflow go workflow.yml
where workflow.yml
is the name of the workflow file.
Run the command matflow kill workflow/directory/path
where workflow/directory/path
is the path to the workflow directory that is generated by MatFlow. This command will delete all running and queued jobs associated with the workflow.
We can get MatFlow to copy (a subset) of the workflow files to a Dropbox account after the workflow completes.
-
Add an "archive location" to the MatFlow config file. An archive location looks like this:
archive_locations: dropbox: cloud_provider: dropbox path: /sims
In this case, this tells MatFlow to use the path
/sims
inside your Dropbox directory structure. The path you specify here must exist. -
You can then add an extra key to any of your workflow files to tell MatFlow to use this archive location:
archive: dropbox
. If you want to exclude certain files, you can also add a keyarchive_excludes
to your workflow, which is a list of glob-style patterns to exclude. Task schemas can also includearchive_excludes
.
The first time you submit a workflow that uses this archive location, you will be prompted to authorize hpcflow to connect to your Dropbox account.
As of MatFlow v0.2.21, you can run an archive on a complete workflow like this: matflow archive /path/to/workflow/directory dropbox
. In this case, we choose the archive named dropbox
in our config.yml
file. Any archive defined in the config.yml
file can be chosen. File patterns will be excluded from the archive according to the archive_excludes
patterns in the corresponding task schema definitions, plus any archive_excludes
patterns included in the original workflow submission.
In general, you can associate arbitrary metadata with a workflow in the workflow YAML file by using the metadata
key. Additionally, as of MatFlow v0.2.21, you can specify default metadata that should be applied to all generated workflows. Default metadata is merged with any metadata specified in the workflow YAML file; a metadata item specified in the workflow YAML file will overwrite the same key specified in default_metadata
.