Mykrobe-tools/mykrobe

Bad assumption about file name and chromosome name

Closed this issue · 1 comments

reference_set_name = ".".join(os.path.basename(
args.reference_set).split(".")[:-1])
try:
reference_set = ReferenceSet.objects.get(name=reference_set_name)
except DoesNotExist:
reference_set = ReferenceSet.create_and_save(name=reference_set_name)
# Hack

⚠️ any time you see # Hack you know there are good times ahead.

When we add variants to the mongoDB database with mykrobe variants add, these lines show that there is an assumption that the file name prefix is the same as the name of the chromosome in said file.

For example, I have a file called h37rv.fa, reference_set_name gets set as h37rv. However, the chromosome name in that file is NC_000962.3. So later on, this command fails with

KeyError: 'Reference NC_000962.3 cannot be found in reference set 6191efc47f6ea7585aa56abd (h37rv). Please add it to the database.'

The simple thing to do to fix this would be to extract the chromosome name from the reference file, but there is an assumption there that there will only be one chromosome. I suspect this is fine, but just wanted to run it by you @martinghunt and @iqbal-lab.

I can add this fix to #138

I think this is ok. AFAIK many places in the code assume one chromosome.