bsmn/bsmn-pipeline

document how to generate sample list file

Opened this issue · 3 comments

document how to generate sample list file
#!/bin/bash

synapse query "select name, id, sample_id_biorepository, sample_id_original, experiment_id, grant, group, assay, processingKit from syn7871084 where fileFormat='fastq'" \
    |tail -n+2 \
    |cut -f4- \
    |awk -F"\t" '{print $7"\t"$3"-"$4"-"$5"-"$6"\t"$1"\t"$2"\t"$8"\t"$9}' \
    |sort > tmp.fastq.txt
 
printf "group\tsample_id\tfile\tsynapse_id\tassay\tprocessingKit\n" > tmp.header.txt
{ cat tmp.header.txt; grep 10X tmp.fastq.txt; } > Samples.10X_WGS_fastq.txt
{ cat tmp.header.txt; grep wholeGenomeSeq tmp.fastq.txt |grep -v -e 10X -e '-535-' -e '-797-'; } > Samples.regular_WGS_fastq.txt
{ cat tmp.header.txt; grep -e '-535-' -e '-797-' tmp.fastq.txt; } > Samples.shallow_WGS_fastq.txt
{ cat tmp.header.txt; grep exomeSeq tmp.fastq.txt; } > Samples.WES_fastq.txt
{ cat tmp.header.txt; grep targetedSeq tmp.fastq.txt; } > Samples.Targeted_fastq.txt

rm tmp.fastq.txt

{ cat tmp.header.txt
synapse query "select name, id, sample_id_biorepository, sample_id_original, experiment_id, grant, group, assay, processingKit from syn7871084 where group='Vaccarino' and fileFormat='bam'" \
    |tail -n+2 \
    |cut -f4- \
    |awk -F"\t" '{print $7"\t"$3"-"$4"-"$5"-"$6"\t"$1"\t"$2"\t"$8"\t"$9}' \
    |sort \
    |grep -v 10X
} > Samples.regular_WGS_bam.txt

rm tmp.header.txt

This shell script is what I used to get the sample lists for BSMN ref brain data. Among columns, my pipeline only uses sample_id, file, synapse_id. The order of columns doesn't matter. Of course, this pipeline is pretty specific to BSMN ref brain sample.

Can you put this in an executable script in this repository, and document it in the README? Then we can close.

@attilagk it would be great if you can verify for @bintriz that this is sufficient.