/arvados_cwl_tester

Primary LanguagePythonApache License 2.0Apache-2.0

Framework for testing Common Workflow Language on Arvados - arvados_cwl_tester

Note: This is not the final, official version

Thank you for interest in arvados-cwl-tester framework. Please note that the current version is not the final, official version. While the frameworkk is usable , the work is still in progress. The official version will be released soon.

Introduction

arvados_cwl_tester framework is dedicated to people, that develop pipelines in CWL and run them on Arvados. It allows to create easy and reproducible tests for CWL CommandLineTools and Workflows and run them in parallel on Arvados, using pytest.

arvados_cwl_tester API allows to write tests using python code and organize them in python script.

Every test runs process on Arvados in temporary subproject, that will be removed after some time automaticly that that will keep your testing space clean and tidy.

Get started

Setup environment

  1. Create testing environment
virtualenv venv
source venv/bin/activate
  1. Install arvados-cwl-tester
git clone https://github.com/arvados/arvados_cwl_tester.git
cd arvados_cwl_tester
pip install .

Write your first test

Define all tests you need in test_<your_name>.py file. Here you can see an example how this file can look like:

from arvados_cwl_tester import create_ouputs_dict, check_if_collection_output_not_empty, basic_arvados_test


def test_single_step():

# run the test and define in variable test_run to use it later
    test_run = basic_arvados_test(
        "ardev-xxxxxxxxxxxxx",
        "Hello World!",
        "./components/single_step/single_step.cwl",
        {
            "name": "example.txt"
            }
            )

# Check if output collection is not empty - for example in cases when File[] or Directory[] is the output
    assert check_if_collection_output_not_empty(test_run)

# Check if there is a specific file in outputs
    assert 'example.txt' in create_ouputs_dict(test_run)

# Be sure if output has some specific size you expect
    assert create_ouputs_dict(test_run)['example.txt']["size"] > 0

Execute the test

Run in commandline:

pytest -k single_step

Arvados-cwl-tester Functions

basic_arvados_test()

This is basic function that you can use to test cwl on arvados. It returns 'Process' object, that is required as input to other testing functions.

def basic_arvados_test(target_project: str, test_name: str, cwl_path: str, inputs_dictionary: dict=None) -> Process:
    """
    Run process, return process object (class Process)
    Check if project is finished, check if project is completed.
    Arguments:
        target_project: str, uuid of project when process will be executed. Example: arkau-ecds9343fdscdsdcd
        test_name: str, name of the test
        cwl_path: str, path to cwl file that will be executed
        inputs_dictionary: dict, containing cwl inputs. This is optional, because sometimes cwl doesn't require input.
    Returns:
        class Process
    """

create_ouputs_dict()

def create_ouputs_dict(process: Process) -> dict:
    """
    Create dictionary with outputs from process
    Arguments:
        process: class Process
    Returns:
        Dictionary containing outputs filenames as keys and dictionaries as values, with following fields: 'size', 'basename' and 'location'' 
    """

check_if_collection_output_not_empty()

def check_if_collection_output_not_empty(process: Process):
    """
    Checks if output collection in provided process is not an empty collection
    Arguments:
        process: class Process
    Returns:
        boolean - True if collection contains some files, false if is empty
    """

variables.json

Arvados CWL Tester contains implementation that allows to store variables in json file named ./test/variables.json. Those variables can be imported inside different python testing scripts if you want to organize your tests in seperated python files.

For example::

{
  "testing_projects": {
    "ardev": "ardev-*******************81",
    "arind": "arind-*******************92",
    "arkau": "arkau-*******************82"
  },
  "resources": {
    "directories": {
      "fastq_collection": {
        "class": "Directory",
        "path": "keep:********************************6185"
      }
    },
    "files": {
      "reference_genome": {
        "class": "File",
        "path": "keep:********************************6184/Homo_sapiens_assembly38.fasta",
        "secondaryFiles": [
          {
            "class": "File",
            "path": "keep:********************************6183/Homo_sapiens_assembly38.fasta.fai"
          }
        ]
      },
      "intervals": {
        "class": "File",
        "path": "keep:********************************6182/wgs_calling_regions.hg38.bed"
      }
    }
  }
}

Import FILES, DIRECTORIES and UUIDS in python script and use them in your tests:

from arvados_cwl_tester import FILES, DIRECTORIES, UUIDS, basic_arvados_test

def test_single_step():
    test_run = basic_arvados_test(
        UUIDS["akau"],
        "I am using testing variables!",
        "./my_cwl.cwl",
        {
            "fastq_collection": DIRECTORIES["fastq_collection"],
            "intervals": FILES["intervals"],
            "reference_genome": FILES["reference_genome"]
        }
        )

How to execute tests in parallel

Run multiple tests in parallel - it will execute your tests as separated processes on arvados and you will save time.

pytest --workers 10 --tests-per-worker auto

workers defines how many tetst will be executed in parallel

Development of the library

To activate development environment:

bash setup.sh
source venv/bin/activate