sheepdog_testing_suite

The repository is designed to give BioLockJ developers test pipelines (including the input data, configuration file, meta data file, etc) to verify the functionality of BioLockJ throughout the development cycle.

How to use this test suite

Quick Start

Get BioLockJ. See BioLockJ installation for developers.
Fork and clone this repository: git clone https://github.com/<username>/sheepdog_testing_suite.git
Set your environment variables.
Build the MockMain project: cd ${SHEP}/MockMain; ant
Add the TestBioLockJ wrapper script to your path:
(macOS)
echo "export PATH=$PATH:${SHEP}/MockMain/resources" >> ~/.bash_profile
source ~/.bash_profile
(Ubuntu)
echo "export PATH=$PATH:${SHEP}/MockMain/resources" >> ~/.bashrc
source ~/.bashrc
Run the example test set: testBiolockj ${SHEP}/MockMain/resources/testList.txt
This should print some output to the screen that starts with something like:
Reading test list from: /Users/ieclabau/git/sheepdog_testing_suite/MockMain/resources/exampleTestList.txt
and ends with something like:
Total test runtime: 00 hours : 00 minutes : 05 seconds

See the MockMain user guide to learn more about what you see in the example and how to expand from it.

Now that you've gotten started, get good at it! Build up your reference pipelines to get aquainted with this level of automated testing so you can use routine comprehensive tests in your normal dev process.

Build up your reference pipelines

Look at the testBiolockj script that you just ran. Look at the testList.txt file that you ran it on (Excel is recommended). Look at one of the individual config files listed in testList.txt. Look at the pipelines that were created in ${SHEP}/MockMain/pipelines/. See how these things relate to each other. This pattern is the backbone of this test suite.
Create your NOT_IN_GIT_user.properties file following the instructions in dependencies.
Not all tests use this file. Depending on the test you want to run, you may need to set up other dependencies as well, so go ahead and skim that whole page.
Find another existing testList.txt and runThisTestSet.sh pair (Several runThisTestSet.sh scripts are listed in ${SHEP}/test/run_local_testSets.sh.). Review the config files listed in that testList, and make sure you have the dependencies that the tests require. Run the test set.
Create a folder called pipelines next to the testList file. This folder will be ignored by git. Its just for you.
Once the tests are done (and passing), move the pipelines you just generated from $SHEP/MockMain/pipelines to your new pipelines folder. You now have a collection of pipelines that you can reference.
Select another testList and repeat.

Automated testing

The MockMain project is a java program that takes a list of tests as input, runs BioLockJ for each test, and reports the output. See the MockMain Users Guide for more details on using the program. In very very short: most folders of tests have a testList.txt table and a runThisTestSet.sh script. Calling the script will run all the tests listed in the test list, and create a new table ( testList_results.txt ) giving the results of each test. After you are confortable handling individual test sets ( see build up your reference pipelines ) you can move on to running Routine Comprehensive Tests.

When you make changes to BioLockJ (weather fixing a bug, refactoring the framwork or adding new features), you should run a testList. Pick an existing testList that runs full pipelines, or make a custom one to include more extensive testing of the componenents that might be affected by your work. Run the testList before you start working using the current master version of BioLockJ. Run it periodically as you work. Most importantly, run all reasonable tests after you have finalized your changes, but before you submit the pull request to merge your work into the master branch.

If you create a new feature, make tests that prove your feature is working and add them to this repository so any future changes that break your feature are discovered quickly.

be an issue tracking master

Along the way there will surely be times when BioLockJ fails because you did not set up something it needed. In those times, did BioLockJ give you an appropriate error message? Did that info lead you to the solution? If you encountered challenges, users will too! Be familiar with our issue tracking, and make sure these frustrations are in our issue collection. Take a moment to review the tags, read a few issues, get familiar with the interface. Look at newly created issues on a regular basis.

What's here

data_*
There are three data_ directories. They all use the same file structure; same names for files; the only difference is the size. In each folder there is an input folder and a validation folder. The input dir has input data for test pipelines. The validation folder has the expectation files to use for pipelines. Pipelines that use the dynamic variable $SHEP_DATA do give the path to the input data will also need to use the same variable to give the path to the expecation files describing the output of the pipeline.

data_big
These inputs are the full-size files. For sequences, the repository does not store the sequence files, but scripts to download the required sequence files.
data_small
These input files are as small as possible while still being big enough to see that compoenents of BioLockJ are working correctly.
data_tiny
These inputs are designed to run fast to 'test the plumbing' before switching back to a more meaningful test.

dependencies
Many pipelines require parameters or resources that have to be provided in a computer-specific way. This folder has templates and instructions for dealing with these requirements.

MockMain
Java project that automates the testing process.

test
Configuration files to run test pipelines. Most folders have several configuration files, a testList.txt file that lists the tests along with the expected output, and a runThisTestSet.sh script which launches the MockMain test system using the testList.txt file. These are aggregated into the Routine Comprehensive Tests.

Why no metadata folder?

Pipeline resources like metadata, primers, barcodes etc are either the 'real' files for a given dataset or they are custom-made variants for a particular test. If they are real, they live with the data that they accurately describe; test pipelines that use those folders may have to ignore them as input files. If they are made for a particular test, they live next to that test and they have a name that makes it obvious which test.properties file(s) they are there for.