cunningham-lab/neurocaas

Developer Interface Workflow

Closed this issue · 9 comments

One of the main bottlenecks to developer workflow at the moment is bash scripting in the remote ec2 instance- I will add a set of scripts to automate this process from a template (given a desired set of inputs and outputs, automatically write the script to transfer data and otherwise set up the local environment.)

Setup is done- working on #22 to determine next steps for this.

#22 is done. Once the dust has settled, it looks like the best way forwards is to set up a template that describes a variable name for data that we want to analyze, and a location that it should be output to in io-dir, like so:
{
"main_items":[
"data":"inputs/videos/",
"config":"configs/",
]
"supp_items":[
"index":"configs/"
]
}

This will assign variable names that should be easy to manipulate and use in scripts for all data that we work with subsequently: I.e. now if I want to reference the data that I just fetched from s3 in a script, I should just call:
/bin/bash ~/run.sh "$data" "$config" "$index" and I should be able to find the relevant data there.

There are some details to work out here re: how we reference these data in tests, how we reference them in a way that makes sense (should data and config be preset?) But the general idea is here.

It also occurs to me that Automatic Scripting is just a software level portion of the blueprint. This leads to a workflow where developers can successively specify portions of their blueprint: first the docker container where their analyses are run, then their inputs and their organization, then the hardware it will be run on, and finally they will be ready to submit it.

The main content of this can be type and parameter checking.

The software level portion of the blueprint should be the thing that provides continuity for developers from being in the docker container to moving out of it. This workflow of moving out of the docker container should be the thing you design next.

Start inside docker container, then save image to a blueprint. Test parameters can be saved to the blueprint too for clarity. These steps can integrate with the neurocaas_contrib/local LocalEnv api.

Once test parameters pass on the local machine, you can run the same thing on a remote instance. These steps can integrate with the neurocaas_contrib/local RemoteEnv api (some more work to do on #31) before this can happen.

Saving image to blueprint is done. Next is test parameters and localenv integration.

Cli buildout is proceeding. Some intermediary todos to keep track of:

  • Default behavior to launch container using setup-development-container is slightly confusing, and should be signposted more clearly.
  • Implement history: the last n active container and image names get saved.
  • Coordinate with localenv build.
  • Add container + image delete methods that reference the container and image histories.
  • Command crowding: organize as iae, remote-interactive, and remote. Still fine given good documentation.

Incorporate test methods next (test container, run analysis); with an update to readme.

Underestimated the power of the infrastructure we had already. We can ease much of the scripting burden by assuming the developer will write a script that takes data and config as input. The need to manage the scripting goes away because all of the scripting is then managed locally, with assumption of local read and write. This resolves the original intention of this issue. Closing now, with the understanding that several other issues remain:

  • Method to tidy up old images.
  • Input type checking
  • Incorporation of test-container with arbitrary command, not just run analysis
  • Integration with remote workflow (#31)

Update the readme and see how important these methods seem.