shandley/hecatomb

taxonomy improvement

mihinduk opened this issue · 4 comments

Hi Mike,

In working with the SIV data, I realized that there are spaces in taxonomic fields. For example, for families:
Verrucomicrobia subdivision 3
Verrucomicrobia subdivision 6

This will make pulling reads by family more difficult. Could all spaces in taxonomy fields be replaced with underscores in the next update?

Thank you,
Kathie

It's a tab-separated file so spaces shouldn't be a problem. How were you planning on parsing the files?

I was trying to make a helper script to pull reads from a family of interest, so a shell script but could do it differently.

If you're using awk, you'll just need to pass -F '\t' to change the field separator to tabs instead of whitespace.

Hi Mike,
Uploading 2021_05_18_Viral_Baltimore_full_classification_table_ICTV2020.txt…

I just created an updated taxonomy database with Baltimore classification, which I am having trouble uploading here. This is the latest ICTV release. I will email it to you and Rob.