aggreen/MTB-CNN

Convert accession code to fasta file and identifying loci

Closed this issue · 1 comments

Hi! Do you have any scripts on how you obtained the fasta files from the accession codes in master_table_resistance.csv? And then how you identified the loci for each resistance gene within each fasta file?

Hello! First let me apologize for the very late response, I have just returned from a leave of absence and am catching up on what I missed.

I have created a new directory in input_data called prep_fasta_files. This directory includes our code that creates the fasta files from input vcf files by comparing to the reference h37rv.fasta. The code directly extracts the region of interest in the isolate genome based on the parameters you provide, and outputs an alignment of all isolates for which you provided a vcf. You will have to create the vcf files yourself by downloading the read data for the accession codes and processing the data through a variant calling pipeline.

I hope this is helpful, please reach out if you have further questions.