The sequencing reads from a multiplex library are grouped into different directories named after the corresponding barcodes. The first step of analysing these reads is to label the barcode directories as per their sample names or ids. Then, concatenating the reads of a sample into a single fastq file. Again, these concatenated fastq files should be named after their sample names or ids. Any mislabelling of the reads would lead to misleading results.
Here, I have compiled a bash script that automates this whole process. It is not only time-efficient but also takes away the potential risk of mislabelling the sequencing reads.
#!/bin/bash
#metadata
metadata=barcodeNames.csv
# text formatting
Red="$(tput setaf 1)"
Green="$(tput setaf 2)"
Bold=$(tput bold)
reset=`tput sgr0` # turns off all atribute
while IFS=, read -r field1 field2 # reading the metadata file line-by-line, each column is a field
do
echo "${Red}${Bold}Processing ${reset}: "${field1}""
echo ""
echo Renaming ${field1} directory as ${field2}
mv ${field1} ${field2}
echo Concatenating ${field2} reads
cd "${field2}" &&
cat *fastq.gz > ${field2}.fastq
echo Moving ${field2}.fastq into home directory
mv ${field2}.fastq ../
cd "../"
echo "${Green}${Bold}Completed ${reset}: ${field1}"
echo ""
done < ${metadata}
It is a csv file containing the list of the barcodes and the corresponding sample names
Note that if you make this csv file in Windows computer, you will need to convert it to Unix format for using it in Linux computer. Because the Windows computer uses \r\n as line-ending while the Unix uses \n. However, for the conversion, install the dos2unix package as follows:
sudo apt install dos2unix
Then run
dos2unix metadata.csv
This will convert the Windows formatting to the Unix one.
Keep the script and the metadata file in the same directory that contains the barcode directories.
Then, run the script as follows:
./barcodesRenamed.sh
The script renames the barcode directories, concatenates the reads sample-wise, and collects the concatenated fastq files into the home directory. These files are now ready to be used in QC and any downstream analysis.
The screenshot demonstrating the contents before and after executing the above script: