pha4ge/hAMRonization

Srst2 parser implementation

Closed this issue · 1 comments

With the following single-entry output this is currently what is being parsed:

Sample DB gene allele coverage depth diffs uncertainty divergence length maxMAF clusterid seqid annotation
Dummy ResFinder oqxA oqxA 100.0 75.852 1snp 0.152 660 0.037 470 1995 oqxA_1_V00622; V00622; fluoroquinolone

The metadata passed is the following:
metadata = {"analysis_software_version": "0.0.1", "reference_database_version": "2019-Jul-28", "input_file_name": "Dummy", "reference_database_id": 'resfinder'}

This is the current output:

   assert result.input_file_name == 'Dummy'
    assert result.gene_symbol == 'oqxA'
    assert result.gene_name == 'oqxA'
    assert result.reference_database_id == 'ResFinder'
    assert result.reference_database_version == '2019-Jul-28'
    assert result.reference_accession == '1995'
    assert result.analysis_software_name == 'srst2'
    assert result.analysis_software_version == '0.0.1'
    assert result.coverage_percentage == 100
    assert result.reference_gene_length == 660
    assert result.coverage_depth == 75.852

My question is regarding the reference_database_id that is currently required in the metadata, but it's being (correctly!) parsed from the report file. I suggest removing this from the required metadata fields.

Seems reasonable!