pha4ge/hAMRonization

Input file format and structure validation

cimendes opened this issue · 3 comments

In each parser, I suggest adding a validation for the expected file format (tsv, txt, json..) and structure (fields present), giving a clean exit when it fails.

We are kinda limited in how we can do this because the parsers are designed to handle streaming data.

The validity is inherently checked by the putting it in the hamronised format and raising an error if there is an issue, I don't really think wrapping that in try except to give a cleaner error message really adds much (just obfuscation) but if you think that's better it can be easily done.

I would like to get some more opinios on this. Indeed it adds an obfuscation but I think a custom error message would be helful for a user as the general key error doesn't actually explain much. Personally I would prefer a more informative error message but I'm good with what you decide on!

Created a new issue with a plan to improve this (adding a debug flag for full traceback and logging library to handle the different levels.)
Will close and redirect to #56