document how to generate sample list file
Opened this issue · 3 comments
kdaily commented
document how to generate sample list file
bintriz commented
#!/bin/bash
synapse query "select name, id, sample_id_biorepository, sample_id_original, experiment_id, grant, group, assay, processingKit from syn7871084 where fileFormat='fastq'" \
|tail -n+2 \
|cut -f4- \
|awk -F"\t" '{print $7"\t"$3"-"$4"-"$5"-"$6"\t"$1"\t"$2"\t"$8"\t"$9}' \
|sort > tmp.fastq.txt
printf "group\tsample_id\tfile\tsynapse_id\tassay\tprocessingKit\n" > tmp.header.txt
{ cat tmp.header.txt; grep 10X tmp.fastq.txt; } > Samples.10X_WGS_fastq.txt
{ cat tmp.header.txt; grep wholeGenomeSeq tmp.fastq.txt |grep -v -e 10X -e '-535-' -e '-797-'; } > Samples.regular_WGS_fastq.txt
{ cat tmp.header.txt; grep -e '-535-' -e '-797-' tmp.fastq.txt; } > Samples.shallow_WGS_fastq.txt
{ cat tmp.header.txt; grep exomeSeq tmp.fastq.txt; } > Samples.WES_fastq.txt
{ cat tmp.header.txt; grep targetedSeq tmp.fastq.txt; } > Samples.Targeted_fastq.txt
rm tmp.fastq.txt
{ cat tmp.header.txt
synapse query "select name, id, sample_id_biorepository, sample_id_original, experiment_id, grant, group, assay, processingKit from syn7871084 where group='Vaccarino' and fileFormat='bam'" \
|tail -n+2 \
|cut -f4- \
|awk -F"\t" '{print $7"\t"$3"-"$4"-"$5"-"$6"\t"$1"\t"$2"\t"$8"\t"$9}' \
|sort \
|grep -v 10X
} > Samples.regular_WGS_bam.txt
rm tmp.header.txt
This shell script is what I used to get the sample lists for BSMN ref brain data. Among columns, my pipeline only uses sample_id, file, synapse_id. The order of columns doesn't matter. Of course, this pipeline is pretty specific to BSMN ref brain sample.
kdaily commented
Can you put this in an executable script in this repository, and document it in the README? Then we can close.