karlicoss/HPI

possible duplicate logs due to caching logger along with the logging level

purarue opened this issue · 9 comments

I think because of some of the magic that cachew does to hook into logger, its sending messages twice?

Not sure:

[ ~ ] $ LOGGING_LEVEL_HPI=debug hpi query my.browser.export
[DEBUG   2023-09-14 17:12:57,337 my.browser.export __init__.py:793 ] [my.browser.export:history] using inferred type <class 'browserexport.model.Visit'>
[DEBUG   2023-09-14 17:12:57,337 my.browser.export __init__.py:793 ] [my.browser.export:history] using inferred type <class 'browserexport.model.Visit'>
[DEBUG   2023-09-14 17:12:57,337 my.browser.export __init__.py:958 ] using /home/sean/.cache/cachew/my.browser.export:history for db cache
[DEBUG   2023-09-14 17:12:57,337 my.browser.export __init__.py:958 ] using /home/sean/.cache/cachew/my.browser.export:history for db cache
[DEBUG   2023-09-14 17:12:57,339 my.browser.export __init__.py:962 ] new hash: {"cachew": "0.13.0", "schema": "[Column('url', String(), table=None), Column('dt', IsoDateTime(), table=None), Column('_metadata_is_null', Boolean(), table=None), Column('metadata_title', String(), table=None), Column('metadata_description', String(), table=None), Column('metad
ata_preview_image', String(), table=None), Column('metadata_duration', Integer(), table=None)]", "dependencies": "['/home/sean/data/browsing/browsing-1694545629.jsonl.gz', '/home/sean/data/browsing/chrome-20230913061701.sqlite', '/home/sean/data/browsing/chrome-20230914061715.sqlite', '/home/sean/data/browsing/firefox-20230913061700.sqlite', '/home/sean/data/browsing/firefox-20230914061714.sqlite']"}
[DEBUG   2023-09-14 17:12:57,339 my.browser.export __init__.py:962 ] new hash: {"cachew": "0.13.0", "schema": "[Column('url', String(), table=None), Column('dt', IsoDateTime(), table=None), Column('_metadata_is_null', Boolean(), table=None), Column('metadata_title', String(), table=None), Column('metadata_description', String(), table=None), Column('metadata_preview_image', String(), table=None), Column('metadata_duration', Integer(), table=None)]", "dependencies": "['/home/sean/data/browsing/browsing-1694545629.jsonl.gz', '/home/sean/data/browsing/chrome-20230913061701.sqlite', '/home/sean/data/browsing/chrome-20230914061715.sqlite', '/home/sean/data/browsing/firefox-20230913061700.sqlite', '/home/sean/data/browsing/firefox-20230914061714.sqlite']"}
[DEBUG   2023-09-14 17:12:57,348 my.browser.export __init__.py:994 ] old hash: {"cachew": "0.13.0", "schema": "[Column('url', String(), table=None), Column('dt', IsoDateTime(), table=None), Column('_metadata_is_null', Boolean(), table=None), Column('metadata_title', String(), table=None), Column('metadata_description', String(), table=None), Column('metadata_preview_image', String(), table=None), Column('metadata_duration', Integer(), table=None)]", "dependencies": "['/home/sean/data/browsing/browsing-1694545629.jsonl.gz', '/home/sean/data/browsing/chrome-20230913061701.sqlite', '/home/sean/data/browsing/chrome-20230914061715.sqlite', '/home/sean/data/browsing/firefox-20230913061700.sqlite', '/home/sean/data/browsing/firefox-20230914061714.sqlite']"}
[DEBUG   2023-09-14 17:12:57,348 my.browser.export __init__.py:994 ] old hash: {"cachew": "0.13.0", "schema": "[Column('url', String(), table=None), Column('dt', IsoDateTime(), table=None), Column('_metadata_is_null', Boolean(), table=None), Column('metadata_title', String(), table=None), Column('metadata_description', String(), table=None), Column('metadata_preview_image', String(), table=None), Column('metadata_duration', Integer(), table=None)]", "dependencies": "['/home/sean/data/browsing/browsing-1694545629.jsonl.gz', '/home/sean/data/browsing/chrome-20230913061701.sqlite', '/home/sean/data/browsing/chrome-20230914061715.sqlite', '/home/sean/data/browsing/firefox-20230913061700.sqlite', '/home/sean/data/browsing/firefox-20230914061714.sqlite']"}
[DEBUG   2023-09-14 17:12:57,348 my.browser.export __init__.py:1003] hash matched: loading from cache
[DEBUG   2023-09-14 17:12:57,348 my.browser.export __init__.py:1003] hash matched: loading from cache
[INFO    2023-09-14 17:12:58,004 my.browser.export __init__.py:1005] my.browser.export:history: loading 1221312 objects from cachew (sqlite /home/sean/.cache/cachew/my.browser.export:history)
[INFO    2023-09-14 17:12:58,004 my.browser.export __init__.py:1005] my.browser.export:history: loading 1221312 objects from cachew (sqlite /home/sean/.cache/cachew/my.browser.export:history)
^C

this is on cachew 0.13.0:

$ pip freeze | grep cachew   
cachew==0.13.0

Not urgent or anything, just thought Id track it

If you're not able to reproduce let me know

huh, I forgot to pull after latest HPI_LOGS fix (had c283e54), and it didn't happen at first, but yeah reproduces after updating

Doesn't look like it's happening for other modules though, so I wonder if it's something about setup_browserexport_logger

ah yeah, seems to be
is probably adding two handlers, will see if I can prevent that

I wonder if it should have something to do with NOTSET level, similar to

HPI/my/core/logging.py

Lines 117 to 119 in 2a46341

if logger.level == logging.NOTSET:
# if it's already set, the user requested a different logging level, let's respect that
logger.setLevel(lvl)

#309

I was trying to do something fancy and grabbing the level from the computed logger, just simplified it to creating the logger in each module and passing the log level integer

ah you beat me!
so yeah I guess the issue is that make_logger is using lru_cache, but it's caching along with the level -- I think this is the actual issue?

HPI/my/core/logging.py

Lines 207 to 208 in 2a46341

@lru_cache(None) # cache so it's only initialized once
def make_logger(name: str, *, level: LevelIsh = None) -> logging.Logger:

I think it should be cached by the logger name (perhaps with a helper function wrapped in lru_cache instead), and after that just call setLevel?

Yeah, I think that would work....? always get a bit confused with how the getLogger singleton works internally in python and how setting it up multiple times works

Took me a while to grasp as well! I think the main issue is when you call proper setup multiple times (e.g. adding handlers & formatters) -- then it could end up with duplication.
Logger level can be changed at any time, it's just changed on the singleton object.
Confusingly there is also a handler logging level (which would be trickier to change), but I specifically rewrote logging helper to avoid messing with the handlers, that way it's much easier to reason about

I'll keep the issue for now, will PR a proper fix for caching a bit later -- it works, just need to double check and cleanup :)