"update" caches public suffix list to wrong directory
Closed this issue · 0 comments
Hi!
First off, I love tldextract
, thanks for building it!
The way we use tldextract
is slightly special, but used to be fully supported by the public API. Our docker containers don't have internet access, so when we build them, we cache the latest public suffix list. When our applications use tldextract
, we configure it so that it uses the cache, and never needs an internet connection.
Upon upgrading to any 3.* version of tldextract
, I noticed that the cache was no longer being used to look up information from the public suffix list.
Problem reproduction steps
First, run the command: tldextract --update --private_domains
Then create a basic test file:
import os
from tldextract import TLDExtract
extractor = TLDExtract(cache_dir=os.environ["TLDEXTRACT_CACHE"])
extractor("www.google.com")
Now, create a conditional breakpoint here, where the condition is that namespace
equals publicsuffix.org-tlds
.
Expected behaviour
When running the above program, the break point should be hit, but should not throw a KeyError
.
Actual behaviour
The breakpoint is hit once during the __call__(…)
, and immediately throws a KeyError
because it can't find the cache file.
Explanation
The method run_and_cache
accepts a namespace, which is used to calculate the cache file path. But when the file is downloaded, it uses the hardcoded namespace "urls", which places the file in the wrong location.
I'll write a PR that fixes this problem.