vacanza/holidays

Feature request: add option to warm-up country import cache

Closed this issue · 4 comments

Hello,

Thanks for this library, nice work!

I'm using it in a web application that receives lots of requests, and recently had to debug an issue of thread contention caused by the importlib.import_module here: https://github.com/vacanza/python-holidays/blob/600ac6940a1d2112153773c9241d4f32066acedf/holidays/registry.py#L213

The threads were getting stuck trying to acquire the import lock, the call stack (obtained with py-spy) looked like this:

Thread 1525 (active): "AnyIO worker thread"
    _get_module_lock (<frozen importlib._bootstrap>:185)
    __enter__ (<frozen importlib._bootstrap>:170)
    _find_and_load (<frozen importlib._bootstrap>:1173)
    _gcd_import (<frozen importlib._bootstrap>:1204)
    import_module (importlib/__init__.py:126)
    get_entity (holidays/registry.py:211)
    __call__ (holidays/registry.py:193)
    country_holidays (holidays/utils.py:187)

The fix for it was to call country_holidays in my code at import time for every country i care about, thus avoiding the importlib.import_module from being called from inside the request lifecycle as the entity attribute will be already loaded.

What do you think of adding an option to do that by default?

For example, something like:

if os.getenv('PYTHON_HOLIDAYS_CACHE_WARMUP_ENABLED') == 'true':
    # preload it here

Hi @eliasdorneles,
I know we don't have an issue template yet so I blame myself only for the fact I can't figure out what exactly caused the issue and what the preferred fix would be. In general, I'd avoid introducing env variable dependency but the PH code is totally open for contributions that fix existing users headaches.

Having said that, I'd like to request some clarification on this (multi-threading?) issue. It'd be ideal to have a PR (draft is okay) to understand the problem and assess the fix implementation approach.

Thank you!

Hello,

Right, I understand, I will try to get a minimal example that shows the problem.
I believe it's an issue related to importlib.import_module not being thread-safe.

The web application I mentioned (which I unfortunately cannot share the code as-is) is a FastAPI app, with endpoints defined as "sync" (using regular def and not async def), so the requests are served by anyio worker threads.

I will come back with more details later!

@arkid15r I believe that this PR should fix the problem, #1794. I'll leave it in draft until I can confirm it works and characterize the potential perf impact.

Fixed by #1794