provide public access to _content etc
sebbASF opened this issue · 8 comments
- I have searched the issues (including closed ones) and believe that this is not a duplicate.
- I have searched the documentation and believe that my question is not covered.
- I am willing to lend a hand to help implement this feature.
Feature Request
Plugins regularly need to access _content.
However pylint (rightly) complains that this is using protected-access.
AFAICT it is expected that plugins may need to access _content (and _summary, _content) so the protected status is just a nuisance when using pylint.
Renaming would cause lots of issues, but it would be possible to provide a public accessor for use by plugins.
Would any of these Pelican signals help you deal with access to each document's content, specifically content signals?
Something like:
signals.content_object_init.connect(my_plugin_content_processor)
About 80% of the plugins use this signal.
A content handler would look like this:
from pelican import signals
from pelican.contents import Content, Article, Page
def my_content_object_init(content_class):
# Description:
# First signal handler to provide the actual content of any article/page/static
# file.
#
# arg1 : content_class:Content
#
# article of Article(Content) class provides the following variable member items:
# allowed_statuses:tuple, author:Author, authors:list, category:Category,
# content:str, date:SafeDatetime, date_format:str, default_status:str,
# default_template:str, filename:str, get_content:partial, get_summary:partial,
# in_default_lang:bool, lang:str, locale_date:str, mandatory_properties:tuple,
# metadata:dict, private:str, reader:str, relative_dir:str,
# relative_source_path:str, save_as:str, settings:dict, slug:str,
# source_path:str, status:str, summary:str, tags:list, template:str,
# timezone:Zoneinfo, title:str, translations:list, url:str, url_format:dict
#
# Callstack
# signals.content_object_init.send()
# Content.__init__()
# Article.__init__()
# Readers.read_file()
# ArticlesGenerator.generate_context()
# Pelican.run()
#
# 4th article-related signal
# 3rd signal in ArticlesGenerator.generate_context()
# Still inside read_file()
# First signal appearance having a content provided by Markdown.read_file()
#
# Hooked using signals.content_object_init.connect(my_content_object_init)
#
print('my_content_object_init called')
print('my_content_object_init: content: {0!s}'.format(content_class.content))
if not (isinstance(content_class, Article) or isinstance(content_class, Page)):
return
# Do your article/page processing here
return
you can set above handler up by doing content_object_init
signal:
# This is how pelican plugin works.
# register() is a well-established function name used by Pelican plugin
# handler for this plugin to get recognized, inserted, initialized, and
# its processors added into and by the Pelican app.
import logging
def register():
logger.info(
'MY plugin registered for Pelican, using new 4.0 plugin variant')
signals.content_object_init.connect(my_content_object_init)
I don't see how that helps.
This request is about getting public access to the protected field _content which is part of the Content object, not about getting access to the Content object.
As I have had reviewed all the signals (as of v4.9.1), I am not fully convinced ... yet... that content
needs to be made available outside of signals.content_object_init
signal ... as a 'unprotected' access. Of course, I am not the designer, but this current Pelican design is resonating with me.
While Python (or JetBrain IDE PyCharm) may be able to access this protected ._content
element item, ideally the plugin should only be using the Pelican-community-unprotected variety of .content
element item and that is alone provided toward your own plugin content processor function as hooked by the signals.content_object_init
handler.
Is there a particular signal stage that you need content
access within? I have listed all the signals used in Pelican v4.9.1 in chronological order:
# All signals are listed here as of Pelican v4.9.1
signals.initialized.connect()
signals.get_generators.connect()
signals.readers_init() # Article class
signals.generator_init() #ArticlesGenerator class
signals.article_generator_init.connect()
signals.readers_init()
signals.readers_init() # Page class
signals.generator_init() # PagesGenerator
signals.page_generator_init()
signals.readers_init()
signals.generator_init()
signals.readers_init() # Static class
signals.generator_init() # StaticGenerator
signals.static_generator_init()
signals.article_generator_preread.connect()
signals.article_generator_context.connect()
signals.content_object_init.connect()
signals.article_generator_pretaxonomy.connect()
signals.article_generator_finalized.connect()
signals.page_generator_preread.connect()
signals.page_generator_context.connect()
signals.content_object_init.connect()
signals.page_generator_finalized.connect()
signals.static_generator_preread.connect()
signals.static_generator_context.connect()
signals.content_object_init.connect()
signals.static_generator_finalized.connect()
signals.all_generators_finalized.connect()
signals.get_writers()
signals.feed_generated()
signals.feed_written()
signals.article_generator_write_article.connect()
signals.content_written()
signals.article_writer_finalized.connect()
signals.page_generator_write_page.connect()
signals.content_written()
signals.page_writer_finalized()
signals.content_written()
signals.pelican_finalized()
Here are some plugins that reference _content:
Here are some plugins that reference _content:
Got it. I think I may have a fix, but no time to test it.
Right off the bat, I can tell you that this particular plugin should be easily fixable by replacing the article_generator_finalized
signal with the signal.content_object_init.connect(parse_content)
:
def register():
signals.initialized.connect(get_excludes)
signals.content_object_init.connect(parse_content)
signals.page_generator_context.connect(set_definitions)
Upgrading the protected content._content
into a normal content.content
.
def parse_content(content):
# vvvvv NEW CODE vvvvv
# Only process Article or Page subclass contents
if not (isinstance(content_class, Article) or isinstance(content_class, Page)):
return
# ^^^^^ NEW CODE ^^^^^
# resume normal code
soup = bs4.BeautifulSoup(content._content, 'html.parser')
...
Notice a choice of article or page, modify that as needed.
Oh yea, totally remove the parse_articles
function and its articles' looping, as the signal is now operating on a single per-document basis.
Your suggested change does solve the issue. It does not change the line where _content
is referenced:
soup = bs4.BeautifulSoup(content._content, 'html.parser')
Oops, my bad. Please, if you haven't, replace ALL instances of ._content
with .content
, I meant. Did that work as well?
_content is a protected variable, in short, it is a read-only variable that is discourage from making any access to it by a function.
We are going round in circles.
._content and .content don't always return the same value, otherwise plugins would not need to use _content.