pha4ge/hAMRonization

nucleotide specific fileds are empty while importing AmrFinderPlus results based on nucleotide sequences

Opened this issue · 2 comments

I have the following results from AmrFinderPlus by passing --name.

--name NAME
    Text to be added as the first column "name" to all rows of the report, for example it can be an assembly name

Clearly "Protein identifier" is NA however when the data is processed by hamronization NA values are passed for specific nucleotide_field_mapping columns such as Contig id, Start, Stop, etc.

I was just wondering if in this line, for the if statement, there should be a clause to consider whether the user chooses to use --name or not as index 0 might not always reflect "Protein identifier" column.

Name    Protein identifier  Contig id   Start   Stop    Strand  Gene symbol Sequence name   Scope   Element type    Element subtype Class   Subclass    Method  Target length   Reference sequence length   % Coverage of reference sequence    % Identity to reference sequence    Alignment length    Accession of closest sequence   Name of closest sequence    HMM id  HMM description
Thauera-sp_2A1  NA  NZ_SSXV01000004.1   210683  213541  +   clpK    heat shock survival AAA family ATPase ClpK  plus    STRESS  HEAT    NA  NA  BLASTX  953 949 100.00  94.02   953 ASF80763.1  heat shock survival AAA family ATPase ClpK  NA  NA
Thauera-sp_2A1  NA  NZ_SSXV01000008.1   103983  104330  -   merT    mercuric transport protein MerT plus    STRESS  METAL   MERCURY MERCURY BLASTX  116 116 100.00  93.97   116 AAA98222.1  mercuric transport protein MerT NA  NA
Thauera-sp_2A1  NA  NZ_SSXV01000143.1   103856  104203  -   merT    mercuric transport protein MerT plus    STRESS  METAL   MERCURY MERCURY BLASTX  116 116 100.00  93.97   116 AAA98222.1  mercuric transport protein MerT NA  NA

This is quite misleading when there are two copies of res genes matching with the same reference since the important information on contig id, start, stop columns will be ignored. See rows 2 and 3 in this example.

Thanks for pointing this out it looks like Name was not previously in the output so I'll update/modify now.

Thanks @fmaguire !

In AMRFinderPlus there is an option --name that prints out the sample names to the output.
You might wanna consider another approach (maybe specifying column name?) to cover both behavior in case users choose either options.

I updated the bug report to describe the behavior.