nextstrain/seasonal-flu

select_strains.py fails if metadata age column has missing values

philipshirk opened this issue · 1 comments

Current Behavior
When I run the select_strains.py script on a metadata file that has missing values in the age column, it fails with the error:

Traceback (most recent call last):
  File "scripts/select_strains.py", line 292, in <module>
    metadata = parse_metadata(args.segments, args.metadata, date_format=args.date_format)
  File "scripts/select_strains.py", line 207, in parse_metadata
    if age_str[-1]=='y':
IndexError: string index out of range

Expected behavior
Similar to the ncov workflow: where missing data in the metadata is permitted and the build still runs.

How to reproduce
Steps to reproduce the current behavior:

  1. Download data from proprietary database (in a format I believe to consistent with Nextstrain)
  2. augur parse
  3. Manually edit "region" data in Python to match Nextstrain's naming conventions
  4. scripts/construct-recency-from-submission-date.py
  5. augur filter
  6. scripts/select_strains.py

Possible solution
line 207 in select_strains.py currently assumes that each entry has text. Adding an if statement to allow for empty values could be one solution.

Your environment: if running Nextstrain locally

  • Operating system: centos linux
  • Version (e.g. auspice 2.7.0): nextstrain.cli 3.0.3

@philipshirk This issue should be resolved by the refactored workflow that no longer uses the select_strains.py script, but please reopen the issue and let us know, if any age-related parsing issues remain.