Plugin name disappeared from log messages
egberts opened this issue · 8 comments
- [x ] I have read the Filing Issues and subsequent “How to Get Help” sections of the documentation.
- [ x] I have searched the issues (including closed ones) and believe that this is not a duplicate.
Issue
the plugin name disappeared from its log message, albeit 3-years ago in this patch. And not only did name
field disappear from the log format, the name
field got hard coded with 'py.warning'
for all Pelican logs.
Rationale
Having a path/filename column alone is quite the rather useless if many plugins share the same filename (of which THEY ARE ALLOW TO SO, such as __init__
, __main__
, init
), so plug-in name would be very useful addition (that we used to have) to distinguish apart the same common filenames between Pelican and also between MANY plugins.
name
field item of this logging
module contains whatever it is declared in logger = logging.getLogger('name'). Pelican starts it out with
pelican` as a logger name.
Logger name
:
- used primarily to reflect and separate "daemons/tasks", "subsystems", "submodules", and ... "plugins".
- Logger "provides" nesting of Python callstack as a default.
- We can repurpose this name to reflect our plugins along with no. 2 above.
Our Pelican plugin makes use of the logger = logging.getLogger(__name__)
so that plugin-specific names can appear with the log message as well.
Is this something that we can quickly fix by introducing the (name)
into this LOG_FORMAT
?
Perhaps, like:
LOG_FORMAT = "%(name)s: %(message)s"
so that we can get something like:
DEBUG pelican.plugins._utils: Loading plugin _utils.py:104
`m.htmlsanity`
DEBUG pelican.plugins._utils: Loading plugin _utils.py:104
`tableize`
DEBUG pelican: Registering plugin `m.htmlsanity` __init__.py:78
DEBUG pelican: Registering plugin __init__.py:78
`pelican.plugins.tableize`
INFO pelican.plugins.tableize.tableize: tableize tableize.py:584
plugin registered for Pelican, using new 4.0
plugin variant
DEBUG pelican.plugins.tableize.tableize: tableize.py:249
tableize_pelican_initialized_all signal
WARNING pelican.plugins.tableize.tableize: tableize.py:271
TABLEIZE_PLUGIN does not exist in
pelican.settings; plugin's built-in defaults
Alternative Output of Plugin Name
Alternatively, we could prepend the plugin name before the filename like in:
DEBUG Loading plugin `m.htmlsanity` _utils.py:104/plugins._utils
DEBUG Loading plugin `tableize` _utils.py:104/plugins._utils
DEBUG Registering plugin `m.htmlsanity` __init__.py:78/pelican
DEBUG Registering plugin `pelican.plugins.tableize` __init__.py:78/pelican.plugins.tableize
INFO tableize plugin registered for Pelican, using tableize.py:584/pelican.plugins.tableize.tableize
new 4.0 plugin variant
DEBUG tableize_pelican_initialized_all signal tableize.py:249
WARNING TABLEIZE_PLUGIN does not exist in tableize.py:271/pelican.plugins.tableize.tableize
pelican.settings; plugin's built-in defaults
I've not determine how to implement this alternative, but I do believe @MinchinWeb can help us there, if this is the more desirable way to do this.
Alternative No. 2 - Shorter name
We could shorten this 'name' even further to just the last two dotted names or anything starting with plugin.
:
- if one word, as-is
- if two word, as-is
- if three word, then
-
- if starts with pelican, cut out
pelican
- if starts with pelican, cut out
-
- if starts with pelican.plugins, cut out
pelican.plugins
.
- if starts with pelican.plugins, cut out
Examples are:
pelican
stays aspelican
pelican.generator
as-ispelican.writer
as-ispelican.plugins.m_htmlsanity
intom_htmlsanity
(that's a plugin bug, for right now it ispelican
)pelican.plugins.tableize.memoizer
becomestableize.memoizer
pelican.plugins.tableize.memoizer.jinja
becomestableize.memoizer.jinja
- and it grows after
pelican.plugins
gets snipped out for deeper nesting.
Comment made on its commit in question here.
HOOK'EM LOG-NAME
BUG: py.warnings
is the currently assigned log name
(until I had my own pelican.py
executable took over properly as pelican
name.)
NOTE: Notice that m.sanity
did not call logger = logging.setLogger(__name__)
? It got the pelican
because it is a legacy (non-4.0) plugin.
NOTE: Notice that using __name__
in logger = logging.setLogger(__name__)
will get you the full multi-dotted name like pelican.plugins.tableize.tableize
? but using your own string results like my_fancy_tableize
results in a simple non-dotted my_fancy_tableize
name? This multi-dotted nature can be disabled by log.py
doing logger = logging.setLogger('pelican')
.
Declaring logger
class variable as a global scope but within a plugin should be safe, scoping-wise:
# my plugin
import logging
# logger = logging.getLogger(__name__)
logger = logging.getLogger('my_fancy_plugin_name')
Platform Used
- OS version and name: Linux 6.1.0-21-amd64 SMP PREEMPT_DYNAMIC Debian 6.1.90-1 (2024-05-03) x86_64 GNU/Linux
- Python version: 3.11.2
- Pelican version: HEAD (513abbf)
- Link to theme: m.css
- Links to plugins: pelican-plugins
- Link to your site: n/a
- Link to your source: n/a
To preface everything else, let me just say that getting logging right is hard. The default terminal is often only 80 characters wide, and a large site can generate thousands of lines of logging as it is; the volume can make it hard to find the details you need.
For reference, the switch to logging with rich
as PR #2897 .
Moving to the new (i.e. current) logging system I found was better for displaying where it was coming from, as it added the calling file to the right margin. I don't think the old system displayed filenames unless it hit an error (and I think the new system still does that).
As for the proposals, for No 1, I've implemented a variation of this with my own plugins, where I will prefix a shortname of the plugin to the log message is displays. The actual code is:
from .constants import LOG_PREFIX # LOG_PREFIX = "[Microblog]"
logger = logging.getLogger(__name__)
def some_function():
# the rest of the function
logger.warning(
'%s micropost "%s" longer than expected (%s > %s).'
% (
LOG_PREFIX,
relative_filename,
post_len,
settings["MICROBLOG_MAX_LENGTH"],
)
There may be a case for providing a Pelican hook for providing the logging prefix; I'm not sure what I think about trying to automatically determine one. In any case, I think logging from Pelican doesn't been to specifically be prefixed.
As for No 2, I think this is too much to put on the right margin, because it would make the main logging message take up many more lines on narrow (and even "normal") terminals.
As for No 2, I think this is too much to put on the right margin, because it would make the main logging message take up many more lines on narrow (and even "normal") terminals.
Well, for starter, we wasted 12 columns just for the timestamp, so there is something to be said about precious real estate space. Do we even need timestamp??? Honestly, Pelican is inherently a sequential processor, nothing special about multiple sub-tasking nor sub-processes. A single line having a single date timestamp is more than sufficient, for all sense and purposes:
datetime = <insert favorite time routine>
logger.info(datetime)
Heck, put two: one for start, one for end of Pelican session.
Also, doing EXISTING plugin development (oh let's say, upgrading to 4.0 plugin scheme), the filename is inherently useless for many.
During plugin development, this class-nesting of 'name' seems a lot more practical (in terms of development) than anything else (but not wanting to diminish the filename/line-number scheme here).
So, are we to be deigned in keeping it harder to do new (or old) plugin (re-)development?
There may be a case for providing a Pelican hook for providing the logging prefix; I'm not sure what I think about trying to automatically determine one. In any case, I think logging from Pelican doesn't been to specifically be prefixed.
Logger name:
- used primarily to reflect and separate "daemons/tasks", "subsystems", "submodules", and ... --> "plugins". <--
Notice the word "plugins"?
It is already in place via LOG_FORMAT
by using something like a string value like %(message)s: %(name)s
, or %(name)s: %(message)s
. This fix is essentially a one-liner fix. We could just leave it at that if we supplied this as a "helpful hint" toward a not-so-editable developer page.
The only reason why I looked into other column placement is because of excessive wraparounds, which we could fix perhaps by removing the timestamp column. Again, do we really, REALLY need a timestamp column? (asking for a co-worker).
Nothing I suggested would require changes to the existing code base for all plugins, just leveraging the logging
to its fullest.
Don't get me wrong, I absolutely love the RichHandler stuff, just the timestamp, if made a configurable CLI option may make this easier to read longer output.
Maybe picking up the terminal width and passing that as well, but I dunno as to its extensibility of RichFormat, what do you think, @MinchinWeb ? Something like in pelican/log.py
here:
terminal_width = 80
try:
terminal_width = os.get_terminal_size()
if terminal_width < 80:
terminal_width = 80
except:
# Must be a virtual terminal like PyCharm console or VSCode console
terminal_width = 80
This working snippet of code actually works in both PyCharm/VSCode as well as xterm/terminal ....
This snippet may obviate the need to pursue any further logging change except for:
- A helpful hint for plugin developers to update
LOG_FORMAT =
to'%(name)s: %(message)s'
inpelican/log.py
.
Also the official Python logging
states that our LOG_FORMAT
is defaulted to:
'%(levelname)s %(name)s %(message)s'
Use the specified format string for the handler. Defaults to attributes levelname, name and message separated by colons.
Reference
Perhaps we could support LOG_FORMAT
at the configuration settings file?
Do we even need timestamp???
On trivial websites, per log line timestamps aren't that important. As the website grows, the timestamps become more helpful to show that pelican
is still working away (I've run sites that have >10,000 pages in them; site generation time was >20 minutes).
Personally, I think they're worthwhile to keep.
Also, doing EXISTING plugin development (oh let's say, upgrading to 4.0 plugin scheme), the filename is inherently useless for many.
If you've moved your plugin to a folder, and it's a single file named __init__.py
, yes, the logged filename is less useful. Personally, I've taken to prefixing my plugin log lines to help. For all but my most trivial plugins, I have the functionality in a file other than __init__.py
which has the bonus of providing a more useful logged filename.
Maybe picking up the terminal width [...]
I think rich
has a function for this. Since we're already using the library, my default would be to (re-)use that function. But I'm still not convinved either of the alternatives you've proposed would fit well on narrow terminals.
Also the official Python
logging
states that ourLOG_FORMAT
is defaulted to'%(levelname)s %(name)s %(message)s'
rich
(and here) is using a different default.
Perhaps we could support
LOG_FORMAT
at the configuration settings file?
This seems reasonable.
LOG_FORMAT
! Let's work with that angle of some kind while retaining Rich
capability set.
I will prototype that while keeping Rich
as the default log wrapper.
Do we need to be mindful of any other current but unresolved opinions that are ancillary to logging?