Ecogenomics/GTDBNCBI

Database and file structure integrity test

donovan-h-parks opened this issue · 0 comments

It is important that the database and genome directory structure be in sync. In particular, any genome listed in the database should be present in the genome directory structure and have all expected files present in the directory. Similarly all user genomes should appear in the database. Some NCBI will not show up in the database, but these will be genomes missing called proteins. A script or hidden command in the GTDB code base to sanity checking that all this is in order would be good.