KwanLab/Autometa

autometa-taxonomy-lca return TypeError without specifying --cache

Closed this issue ยท 3 comments

Current Behavior

Running autometa-taxonomy-lca without specifying --cache would return a TypeError.

I think the following lines might cause the issue.

def __init__(self, dbdir: str, verbose: bool = False, cache: str = None):
super().__init__(dbdir, verbose=verbose)
self.verbose = verbose
self.dbdir = dbdir
self.cache = cache
self.disable = False if verbose else True
if self.cache and not os.path.exists(self.cache):
logger.info(f"Created LCA cache dir: {self.cache}")
os.makedirs(self.cache)

Without specifying --cache, it would become a NoneType and would cause a TypeError when passing to the os.path.exists().
Maybe can assign a default path for the cache dir to fix it.

Steps to Reproduce

autometa-taxonomy-lca --blast data/blastp.tsv --dbdir ~/home/bigdata/autometa/databases/ncbi --lca-output data/lca.tsv --verbose

[03/09/2022 11:00:48 AM DEBUG] autometa.taxonomy.ncbi: Processing nodes from /rhome/ctsai085/bigdata/autometa/databases/ncbi/nodes.dmp
[03/09/2022 11:01:04 AM DEBUG] autometa.taxonomy.ncbi: nodes loaded
[03/09/2022 11:01:04 AM DEBUG] autometa.taxonomy.ncbi: Processing names from /rhome/ctsai085/bigdata/autometa/databases/ncbi/names.dmp
[03/09/2022 11:01:19 AM DEBUG] autometa.taxonomy.ncbi: names loaded
[03/09/2022 11:01:19 AM DEBUG] autometa.taxonomy.ncbi: Processing nodes from /rhome/ctsai085/bigdata/autometa/databases/ncbi/merged.dmp
[03/09/2022 11:01:20 AM DEBUG] autometa.taxonomy.ncbi: merged loaded
Traceback (most recent call last):
  File "/rhome/ctsai085/.conda/envs/metagenome/bin/autometa-taxonomy-lca", line 10, in <module>
    sys.exit(main())
  File "/rhome/ctsai085/.conda/envs/metagenome/lib/python3.8/site-packages/autometa/taxonomy/lca.py", line 775, in main
    lca = LCA(dbdir=args.dbdir, verbose=args.verbose, cache=args.cache)
  File "/rhome/ctsai085/.conda/envs/metagenome/lib/python3.8/site-packages/autometa/taxonomy/lca.py", line 86, in __init__
    self.tour_fp = os.path.join(self.cache, "tour.pkl.gz")
  File "/rhome/ctsai085/.conda/envs/metagenome/lib/python3.8/posixpath.py", line 76, in join
    a = os.fspath(a)
TypeError: expected str, bytes or os.PathLike object, not NoneType

Expected Behavior

Should create a default path for cache dir when it is not assigned.

Environment Information

autometa-configure --dryrun --debug

I installed autometa from conda and do not have autometa-configure... I tried autometa-config --dryrun --debug but it do not have such arguments.

Hello @chtsai0105, thank you again for bringing another hiccup to our attention! If you would like to submit a pull request, an easy fix would be to change the default from None to an empty string , e.g. "". For now, to save space, we do not require the user create a cache for the LCA process.

I can submit a pull request to fix this soon (or you are welcome to contribute, if you are not busy). Thanks again for pointing this out.


I installed autometa from conda and do not have autometa-configure... I tried autometa-config --dryrun --debug but it do not have such arguments.

Thank you as well for pointing this out, we need to update our new issue template ๐Ÿ˜… ๐Ÿ‘

Hi @WiscEvan, I've review the codes again and made some fixes:

lca = LCA(dbdir=args.dbdir, verbose=args.verbose, cache=args.cache)

def __init__(self, dbdir: str, verbose: bool = False, cache: str = None):

First, because the args.cache have been passed into the LCA class in the main function. So it would still remain to be None if users are not specifying, even we set the default as "" inside LCA.

if self.cache and not os.path.exists(self.cache):
logger.info(f"Created LCA cache dir: {self.cache}")
os.makedirs(self.cache)

Second, the if statement here is not what cause the error. Since the statement would be skipped when the cache is None.

self.tour_fp = os.path.join(self.cache, "tour.pkl.gz")
self.tour = None
self.level_fp = os.path.join(self.cache, "level.pkl.gz")
self.level = None
self.occurrence_fp = os.path.join(self.cache, "occurrence.pkl.gz")
self.occurrence = None
self.sparse_fp = os.path.join(self.cache, "precomputed_lcas.pkl.gz")
self.sparse = None

What really cause the error is these lines, which trying to join the string with the NoneType. So my fix is to move all the os.path.join() codes under the if statement and leave the rest of them out there. But still need to evaluate whether the change would cause any side-effect.

First of all, thanks for contributing a PR! ๐Ÿฅณ ๐ŸŽ‰

Unfortunately, these changes will break some of the other methods in the LCA class. I will submit suggested changes on your PR so we can get this merged in ๐Ÿ‘