nucleotide specific fileds are empty while importing AmrFinderPlus results based on nucleotide sequences
Opened this issue · 2 comments
I have the following results from AmrFinderPlus by passing --name
.
--name NAME
Text to be added as the first column "name" to all rows of the report, for example it can be an assembly name
Clearly "Protein identifier" is NA
however when the data is processed by hamronization NA
values are passed for specific nucleotide_field_mapping
columns such as Contig id
, Start
, Stop
, etc.
I was just wondering if in this line, for the if statement, there should be a clause to consider whether the user chooses to use --name
or not as index 0 might not always reflect "Protein identifier" column.
Name Protein identifier Contig id Start Stop Strand Gene symbol Sequence name Scope Element type Element subtype Class Subclass Method Target length Reference sequence length % Coverage of reference sequence % Identity to reference sequence Alignment length Accession of closest sequence Name of closest sequence HMM id HMM description
Thauera-sp_2A1 NA NZ_SSXV01000004.1 210683 213541 + clpK heat shock survival AAA family ATPase ClpK plus STRESS HEAT NA NA BLASTX 953 949 100.00 94.02 953 ASF80763.1 heat shock survival AAA family ATPase ClpK NA NA
Thauera-sp_2A1 NA NZ_SSXV01000008.1 103983 104330 - merT mercuric transport protein MerT plus STRESS METAL MERCURY MERCURY BLASTX 116 116 100.00 93.97 116 AAA98222.1 mercuric transport protein MerT NA NA
Thauera-sp_2A1 NA NZ_SSXV01000143.1 103856 104203 - merT mercuric transport protein MerT plus STRESS METAL MERCURY MERCURY BLASTX 116 116 100.00 93.97 116 AAA98222.1 mercuric transport protein MerT NA NA
This is quite misleading when there are two copies of res genes matching with the same reference since the important information on contig id, start, stop columns will be ignored. See rows 2 and 3 in this example.
Thanks for pointing this out it looks like Name was not previously in the output so I'll update/modify now.
Thanks @fmaguire !
In AMRFinderPlus there is an option --name
that prints out the sample names to the output.
You might wanna consider another approach (maybe specifying column name?) to cover both behavior in case users choose either options.
I updated the bug report to describe the behavior.