leylabmpi/Struo2

error and uncertainty with GTDB_metadata_filter.R

slschnorr opened this issue · 3 comments

Sorry I came across another issue I am not sure how to resolve...
When running the R script for GTDB_metadata_filter.R, it threw this error:

Reading in file: https://data.gtdb.ecogenomic.org/releases/release202/202.0/ar122_metadata_r202.tar.gz Error in fread(url, sep = "\t", check.names = TRUE) : embedded nul in string: 'ar122_metadata_r202.tsv\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\00000664\00002241\00001753\000024736501\014037146135\0015426\0 0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0ustar \0uqpchaum\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0dataadmin\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0accession' Execution halted

Also, the R.utils package needs to be installed separately in the conda struo2 env, maybe that is expected?
And is there documentation about this particular R filtering script parameters? what are the defaults and what areas should be changed for custom filtering? I'm sorry, like I said I've been having trouble following and understanding the current readme documentation. Thanks a lot for your help.

It's due to a weird character in that line. Maybe changing the comment character for fread would help. I'll have to check

Actually, it's because the file is a tarball. For now, it can be fixed by first downloading the file and uncompressing it. I'll update the code to handle tarballs.

I've added r-r.utils to the conda env yaml.

OK, the R script should now be able to handle tarballs. Reopen if you still have problems.