DO NOT FORK this is meant to be used from Use this template feature.
- Click on Use this template
- Give a new name to your gladier project
(e.g.gladier_amazing_science
recommendation is to use all lowercase and underscores separation for repo names.) - Wait until the first run of CI finishes
(Github Actions will process the template and commit to your new repo) - Then clone your new gladier and start tweaking it.
NOTE: WAIT until first CI run on github actions before cloning your new project.
Hello traveler, this is your tutorial into the world of automation.
This repository is designed to be a guide through the complexities of Gladier and its tools. For full automation of an experiment, there is a step of infrastructure that cannot be ignored. This is usually done once and allows the user to focus on tool development.
Three main operations are included on the simpleClients:
- Process
- Transfer
- Publish
To run this tutorial we advise that a new environment is created on your favorite tool. For simplicity, we will use miniconda although any development app with access to pip should suffice.
conda create -n gladier-test python pip
conda activate gladier-test
pip install gladier
Our first example can be found at example_client/example_client.py
. It creates and executes a flow with one operation.
./example_client/example_client.py
To execute a remote function (the equivalent of a lambda
) we use the funcX service to register, retrieve and execute functions. The service requires that a small instance client is deployed at the "processing" machine. We have a client running at one of our example machines. The location of this client is defined by:
'funcx_endpoint_compute': '4b116d3c-1703-4f8f-9f6f-39921e5864df'
Note1: The remote machine is not necessary remote. Different parts of the flow can be executed on different machines by changing the funcx_endpoint_compute
value for each tool. This will be explored on a further section.
Note2: Gladier reads the python function imported by SimpleTool
and automatically register or re-register(in the case of changes in the local function definition) it with the funcx service. The UUID of this function is automatically populated in the flow definition.
During the execution takes care of creating the flow definition and registering it with the globus service:
Flow created with ID: ddb9d6be-d48f-40df-a663-6bcc6db1bb76 https://app.globus.org/flows/ddb9d6be-d48f-40df-a663-6bcc6db1bb76
And also created the run after the .run()
execution:
Run started with ID: 6fa0969a-2778-4f7f-95d5-d365e89aca32 https://app.globus.org/runs/6fa0969a-2778-4f7f-95d5-d365e89aca32
Running the client again will not register a new flow with the globus service but will generate a new run instance.
The SimpleTool
and its driving function simple_function
are separated into a tools
folder in a single file. We advise to create one python file per "action" in the flows. This makes development and debugging and tracing errors much simpler.
The example_client.py
itself also is separated from the other clients in the folder and only contain one GladierBaseClient
. This prevents instances being created with the 'wrong' flow definition or common mistakes on 'what is running'.
Our second example can be found at example_client/example_client_transfer.py
. It transfer a file from our remote server into your workstation.
In order to allow for transfer, the first step is to introduce the workstation in the the creates and executes a flow with one operation.
./example_client/example_client_transfer.py
Our Third example can be found at example_client/example_client_publish.py
. It do a simple operation of publishing some data into a globus index.
As before, it requires the setup of a globus index and how to visualize it.
pip install globus-search-cli
globus-search login
globus-search index create example-index gladier-example-index
The result is a new search index on the globus-search database which will serve as a "repository" for the flow data.
{
"@datatype": "GSearchIndex",
"@version": "2017-09-01",
"creation_date": "2022-04-27 21:04:30",
"description": "gladier-example-index",
"display_name": "example-index",
"id": "563c3d98-6fa8-4ef5-83e2-0f378efe0a5f",
"is_trial": true,
"max_size_in_mb": 1,
"num_entries": 0,
"num_subjects": 0,
"size_in_mb": 0,
"status": "open",
"subscription_id": null
}
The search index id 563c3d98-6fa8-4ef5-83e2-0f378efe0a5f
will be used so the flow knows where to send metadata too.
To execute our simple publish client
./example_client/example_client_publish.py
To check if the data went to the index try this check https://acdc.alcf.anl.gov/globus-tutorial/563c3d98-6fa8-4ef5-83e2-0f378efe0a5f
./scripts/create_setup.py
conda create -n gladier pip
conda activate gladier
git clone https://github.com/globus-gladier/gladier-client-template
cd gladier-client-template
pip install -r requirements.txt
python setup.py develop