/Retrieve_BIP_Phenotypes

Docker container for retrieveing Phenotypic data from the Brassica Information Portal. It uses the BIP-API, stores output in .csv-format and passes the Sequence IDs on into a .txt file for further queries.

Primary LanguageRuby

alt tag

Retrieve-Brassica-Phenotypes

Docker container for retrieveing Phenotypic data from the Brassica Information Portal. It runs a ruby script that uses the BIP-API to retrieve Accession name, Sequence ID, and trait measurements and stores output in .csv-format The resulting header is as follows: < <Trait>,Seq_id,<Trait_name1>,<Trait_name2>,<Trait_nameN> >based on the TASSEL-5 Phenotype Format- version 4 format

Sequence IDs get passed on into a Seq_names.txt file for further queries. It also creates a Sequence_IDs_log.txt, where the user can see whether some accessions did not have any or multiple Sequence IDs.

In case of no Sequence ID's, the Accession is skipped, but recorded in the log. In case of multiple Sequence ID's, the first Sequence is used for the list of Seq_names and will then be used in subsequent downloads.

Please renew your BIP-API-key after your download for security reasons.

User Instructions for use with Docker only

example command

( Note that you need to have a a BIP user account to provide the API key.)

docker run -c '/aboslute/path/to/where/output/files/should/be/stored/':/tmp CyVerseUK/retrieve_bip_phenotypes <BIP_trial_name> <your_BIP_API_key>  

This Docker Image is used in context with the AGAVE API and CyVerseUK, so that the output can be integrated into further CyVerse workflows

User Instructions for use on the CyVerse Discovery environment

In the Discovery Environment, (1) select “Apps”. (2) Search for “Retrieve-Brassica-Phenotypes” in the search bar and click on the app. (3) In case you want to choose a different output folder, you can change this now. Then, (4) click on Parameters, and (5) insert the BIP trial name as it is registered in BIP and your BIP-API-Key. (6) Click Launch Analysis. The steps are also visualised in fig 1 below.

alt tag

User Instructions for use with AGAVE and CyVerse

You don't need to pull this image, Condor will do this in the background for you. You need to have a CyVerse and a BIP account, downloaded the cyverse-sdk client (optional, but makes querying easier) and you must have created a RunApp.json, containing:

{
  "name"    : "Retrieve_BIP_Phenotypes",
  "appId"   : "Retrieve_BIP_Phenotypes-0.0.0",
  "archive" : "true",
  "parameter": {
    "param_1"   : "<the_BIP_trial_name_to_be_queried>",
    "param_2"   : "<your_BIP_API_key>

  }
}

Then, after creating an up-to-date AGAVE API token, run

Jobs-submit -W -F RunApp.json

Optional: you can include an output location, which is different from the default CyVerseUK-Storage system.

"archiveSystem": "data.iplantcollaborative.org",

Change to this system will make the output available for further tools and workflows in the CyVerse US and the Discovery Environment, which is currently not directly hooked-up to the CyVerseUK system. This is likely to be changed in the future, and no DE-specific archiveSystem specifications need to be mentioned in the RunApp.json.

Note: For big jobs, you need to allocate more memory. Currently, this app runs on default parameters. You do this by adding more attributes to the job submission json. A list of all attributes is in table 1, it is taken from the AGAVE API development website [job-submissions]http://developer.agaveapi.co/#job-submission).

Name Value(s) Description
name string Descriptive name of the job. This will be slugified and used as one component of directory names in certain situations.
appId string The unique name of the application being run by this job. This must be a valid application that the calling user has permission to run.
batchQueue string The batch queue on the execution system to which this job is submitted. Defaults to the app’s defaultQueue property if specified. Otherwise a best-fit algorithm is used to match the job parameters to a queue on the execution system with sufficient capabilities to run the job.
nodeCount integer The number of nodes to use when running this job. Defaults to the app’s defaultNodes property or 1 if no default is specified.
processorsPerNode integer The number of processors this application should utilize while running. Defaults to the app’s defaultProcessorsPerNode property or 1 if no default is specified. If the application is not of executionType PARALLEL, this should be 1.
memoryPerNode string The maximum amount of memory needed per node for this application to run given in ####.#[E|P|T|G]B format. Defaults to the app’s defaultMemoryPerNode property if it exists. GB are assumed if no magnitude is specified.
maxRunTime string The estimated compute time needed for this application to complete given in hh:mm:ss format. This value must be less than or equal to the max run time of the queue to which this job is assigned.
notifications* JSON array An array of one or more JSON objects describing an event and url which the service will POST to when the given event occurs. For more on Notifications, see the section on webhooks below.
archive* boolean Whether the output from this job should be archived. If true, all new files created by this application’s execution will be archived to the archivePath in the user’s default storage system.
archiveSystem* string System to which the job output should be archived. Defaults to the user’s default storage system if not specified.
archivePath* string Location where the job output should be archived. A relative path or absolute path may be specified. If not specified, a unique folder will be created in the user’s home directory of the archiveSystem at ‘archive/jobs/job-$JOB_ID’

Table 1. The optional and required attributes common to all job submissions. Optional fields are marked with an astericks.