getpelican/pelican

Plugin name disappeared from log messages

egberts opened this issue · 8 comments

  • [x ] I have read the Filing Issues and subsequent “How to Get Help” sections of the documentation.
  • [ x] I have searched the issues (including closed ones) and believe that this is not a duplicate.

Issue

the plugin name disappeared from its log message, albeit 3-years ago in this patch. And not only did name field disappear from the log format, the name field got hard coded with 'py.warning' for all Pelican logs.

Rationale

Having a path/filename column alone is quite the rather useless if many plugins share the same filename (of which THEY ARE ALLOW TO SO, such as __init__, __main__, init), so plug-in name would be very useful addition (that we used to have) to distinguish apart the same common filenames between Pelican and also between MANY plugins.

name field item of this logging module contains whatever it is declared in logger = logging.getLogger('name'). Pelican starts it out with pelican` as a logger name.

Logger name:

  1. used primarily to reflect and separate "daemons/tasks", "subsystems", "submodules", and ... "plugins".
  2. Logger "provides" nesting of Python callstack as a default.
  3. We can repurpose this name to reflect our plugins along with no. 2 above.

Our Pelican plugin makes use of the logger = logging.getLogger(__name__) so that plugin-specific names can appear with the log message as well.

Is this something that we can quickly fix by introducing the (name) into this LOG_FORMAT?

Perhaps, like:

    LOG_FORMAT = "%(name)s: %(message)s"

so that we can get something like:

           DEBUG    pelican.plugins._utils: Loading plugin         _utils.py:104
                    `m.htmlsanity`                                              
           DEBUG    pelican.plugins._utils: Loading plugin         _utils.py:104
                    `tableize`                                                  
           DEBUG    pelican: Registering plugin `m.htmlsanity`    __init__.py:78
           DEBUG    pelican: Registering plugin                   __init__.py:78
                    `pelican.plugins.tableize`                                  
           INFO     pelican.plugins.tableize.tableize: tableize  tableize.py:584
                    plugin registered for Pelican, using new 4.0                
                    plugin variant                                              
           DEBUG    pelican.plugins.tableize.tableize:           tableize.py:249
                    tableize_pelican_initialized_all signal                     
           WARNING  pelican.plugins.tableize.tableize:           tableize.py:271
                    TABLEIZE_PLUGIN does not exist in                           
                    pelican.settings; plugin's built-in defaults       

Alternative Output of Plugin Name

Alternatively, we could prepend the plugin name before the filename like in:

           DEBUG    Loading plugin `m.htmlsanity`                 _utils.py:104/plugins._utils
           DEBUG    Loading plugin `tableize`                     _utils.py:104/plugins._utils
           DEBUG    Registering plugin `m.htmlsanity`             __init__.py:78/pelican
           DEBUG    Registering plugin `pelican.plugins.tableize` __init__.py:78/pelican.plugins.tableize
           INFO     tableize plugin registered for Pelican, using  tableize.py:584/pelican.plugins.tableize.tableize
                    new 4.0 plugin variant                                              
           DEBUG    tableize_pelican_initialized_all signal        tableize.py:249
           WARNING  TABLEIZE_PLUGIN does not exist in              tableize.py:271/pelican.plugins.tableize.tableize
                    pelican.settings; plugin's built-in defaults       

I've not determine how to implement this alternative, but I do believe @MinchinWeb can help us there, if this is the more desirable way to do this.

Alternative No. 2 - Shorter name

We could shorten this 'name' even further to just the last two dotted names or anything starting with plugin.:

  • if one word, as-is
  • if two word, as-is
  • if three word, then
    • if starts with pelican, cut out pelican
    • if starts with pelican.plugins, cut out pelican.plugins.

Examples are:

  • pelican stays as pelican
  • pelican.generator as-is
  • pelican.writer as-is
  • pelican.plugins.m_htmlsanity into m_htmlsanity (that's a plugin bug, for right now it is pelican)
  • pelican.plugins.tableize.memoizer becomes tableize.memoizer
  • pelican.plugins.tableize.memoizer.jinja becomes tableize.memoizer.jinja
  • and it grows after pelican.plugins gets snipped out for deeper nesting.

Comment made on its commit in question here.

HOOK'EM LOG-NAME

BUG: py.warnings is the currently assigned log name (until I had my own pelican.py executable took over properly as pelican name.)

NOTE: Notice that m.sanity did not call logger = logging.setLogger(__name__)? It got the pelican because it is a legacy (non-4.0) plugin.

NOTE: Notice that using __name__ in logger = logging.setLogger(__name__) will get you the full multi-dotted name like pelican.plugins.tableize.tableize? but using your own string results like my_fancy_tableize results in a simple non-dotted my_fancy_tableize name? This multi-dotted nature can be disabled by log.py doing logger = logging.setLogger('pelican').

Declaring logger class variable as a global scope but within a plugin should be safe, scoping-wise:

# my plugin
import logging
# logger = logging.getLogger(__name__)
logger = logging.getLogger('my_fancy_plugin_name')

Platform Used

  • OS version and name: Linux 6.1.0-21-amd64 SMP PREEMPT_DYNAMIC Debian 6.1.90-1 (2024-05-03) x86_64 GNU/Linux
  • Python version: 3.11.2
  • Pelican version: HEAD (513abbf)
  • Link to theme: m.css
  • Links to plugins: pelican-plugins
  • Link to your site: n/a
  • Link to your source: n/a

To preface everything else, let me just say that getting logging right is hard. The default terminal is often only 80 characters wide, and a large site can generate thousands of lines of logging as it is; the volume can make it hard to find the details you need.

For reference, the switch to logging with rich as PR #2897 .

Moving to the new (i.e. current) logging system I found was better for displaying where it was coming from, as it added the calling file to the right margin. I don't think the old system displayed filenames unless it hit an error (and I think the new system still does that).

As for the proposals, for No 1, I've implemented a variation of this with my own plugins, where I will prefix a shortname of the plugin to the log message is displays. The actual code is:

from .constants import LOG_PREFIX  # LOG_PREFIX = "[Microblog]"

logger = logging.getLogger(__name__)


def some_function():
            # the rest of the function

            logger.warning(
                '%s micropost "%s" longer than expected (%s > %s).'
                % (
                    LOG_PREFIX,
                    relative_filename,
                    post_len,
                    settings["MICROBLOG_MAX_LENGTH"],
                )

(code from https://github.com/MinchinWeb/minchin.pelican.readers.microblog/blob/master/minchin/pelican/readers/microblog/generator.py )

There may be a case for providing a Pelican hook for providing the logging prefix; I'm not sure what I think about trying to automatically determine one. In any case, I think logging from Pelican doesn't been to specifically be prefixed.

As for No 2, I think this is too much to put on the right margin, because it would make the main logging message take up many more lines on narrow (and even "normal") terminals.

As for No 2, I think this is too much to put on the right margin, because it would make the main logging message take up many more lines on narrow (and even "normal") terminals.

Well, for starter, we wasted 12 columns just for the timestamp, so there is something to be said about precious real estate space. Do we even need timestamp??? Honestly, Pelican is inherently a sequential processor, nothing special about multiple sub-tasking nor sub-processes. A single line having a single date timestamp is more than sufficient, for all sense and purposes:

datetime = <insert favorite time routine>
logger.info(datetime)

Heck, put two: one for start, one for end of Pelican session.

Also, doing EXISTING plugin development (oh let's say, upgrading to 4.0 plugin scheme), the filename is inherently useless for many.

During plugin development, this class-nesting of 'name' seems a lot more practical (in terms of development) than anything else (but not wanting to diminish the filename/line-number scheme here).

So, are we to be deigned in keeping it harder to do new (or old) plugin (re-)development?

There may be a case for providing a Pelican hook for providing the logging prefix; I'm not sure what I think about trying to automatically determine one. In any case, I think logging from Pelican doesn't been to specifically be prefixed.

Logger name:

  • used primarily to reflect and separate "daemons/tasks", "subsystems", "submodules", and ... --> "plugins". <--

Notice the word "plugins"?

It is already in place via LOG_FORMAT by using something like a string value like %(message)s: %(name)s, or %(name)s: %(message)s. This fix is essentially a one-liner fix. We could just leave it at that if we supplied this as a "helpful hint" toward a not-so-editable developer page.

The only reason why I looked into other column placement is because of excessive wraparounds, which we could fix perhaps by removing the timestamp column. Again, do we really, REALLY need a timestamp column? (asking for a co-worker).

Nothing I suggested would require changes to the existing code base for all plugins, just leveraging the logging to its fullest.

Don't get me wrong, I absolutely love the RichHandler stuff, just the timestamp, if made a configurable CLI option may make this easier to read longer output.

Maybe picking up the terminal width and passing that as well, but I dunno as to its extensibility of RichFormat, what do you think, @MinchinWeb ? Something like in pelican/log.py here:

terminal_width = 80
try:
    terminal_width = os.get_terminal_size()
    if terminal_width < 80:
        terminal_width = 80
except:
    # Must be a virtual terminal like PyCharm console or VSCode console
    terminal_width = 80

This working snippet of code actually works in both PyCharm/VSCode as well as xterm/terminal ....

This snippet may obviate the need to pursue any further logging change except for:

  • A helpful hint for plugin developers to update LOG_FORMAT = to '%(name)s: %(message)s' in pelican/log.py.

Also the official Python logging states that our LOG_FORMAT is defaulted to:

'%(levelname)s %(name)s %(message)s'

Use the specified format string for the handler. Defaults to attributes levelname, name and message separated by colons.

Reference

Perhaps we could support LOG_FORMAT at the configuration settings file?

Do we even need timestamp???

On trivial websites, per log line timestamps aren't that important. As the website grows, the timestamps become more helpful to show that pelican is still working away (I've run sites that have >10,000 pages in them; site generation time was >20 minutes).

Personally, I think they're worthwhile to keep.

Also, doing EXISTING plugin development (oh let's say, upgrading to 4.0 plugin scheme), the filename is inherently useless for many.

If you've moved your plugin to a folder, and it's a single file named __init__.py, yes, the logged filename is less useful. Personally, I've taken to prefixing my plugin log lines to help. For all but my most trivial plugins, I have the functionality in a file other than __init__.py which has the bonus of providing a more useful logged filename.

Maybe picking up the terminal width [...]

I think rich has a function for this. Since we're already using the library, my default would be to (re-)use that function. But I'm still not convinved either of the alternatives you've proposed would fit well on narrow terminals.

Also the official Python logging states that our LOG_FORMAT is defaulted to '%(levelname)s %(name)s %(message)s'

rich (and here) is using a different default.

Perhaps we could support LOG_FORMAT at the configuration settings file?

This seems reasonable.

LOG_FORMAT! Let's work with that angle of some kind while retaining Rich capability set.

I will prototype that while keeping Rich as the default log wrapper.

Do we need to be mindful of any other current but unresolved opinions that are ancillary to logging?