/bjorn

Primary LanguagePythonGNU General Public License v3.0GPL-3.0

DOI

LOGO

This is the code repository for bjorn - a suite of tools for processing SARS-CoV-2 sequences to support large-scale genomic surveillance. This functionality relies on external tools such as pangolin, UsHER, and GNU parallel.

Installation

cd bjorn
docker build -t'bjorn_container' .

Usage

Launching the container

docker run --v {data_dir}:/data -v {temp_dir}:/temp -it bjorn_container

Running bjorn on a provision of new sequences. See example config file

./bjorn.sh {config.json} {provision.xz} [/data] [/temp]

(Existing sequence db in datadir will be auto-detected according to config.)

Processing a data provision from GISAID's jsonl format to tsv

cat {provision.xz} | ./readseqs.sh {provision_decoder} {provision_parser} {treeinfo_dir} {tempdir} {work_groups} {workers_per_group} > {provision.tsv}

Identifing changed records

./fastdiff.sh {old_records.tsv} {new_records.tsv} {deletes_out.tsv} {insertions_out.tsv} {tempdir}

Analyzing sequences (alignment and mutation- and lineage-calling)

./analysis.sh {provision.tsv} {workers} {subworkers} {blocksize} {treeinfo_dir} {geoinfo_dir} > {analysed_sequences.tsv}

Exporting to outbreak.info's jsonl format

parallel -j{workers} --block {blocksize} --pipepart "./norm_jsonl_output.py -i /dev/stdin -o /dev/stdout -u {unknown_value} -g {geoinfo_dir}" :::: {analysed_sequences.tsv} | gzip -c > {out.jsonl.gz}