kurtmckee/feedparser

Unexpected Author if itunes:owner is present

H4kor opened this issue · 3 comments

H4kor commented

When a feed has both itunes:author and itunes:owner, the owner name is set as author.

Feed:

<rss>
    <channel>
        <itunes:author>Author</itunes:author>
        <itunes:owner>
            <itunes:name>Owner</itunes:name>
            <itunes:email>owner@example.com</itunes:email>
        </itunes:owner>
    </channel>
</rss>

Output:

>>> feedparser.parse("foo.xml")["feed"]["author"]
'Owner'

Expected:

>>> feedparser.parse("foo.xml")["feed"]["author"]
'Author'

I can provide a patch, if this is something that should be changed. Just wanted to be sure this isn't intended behavior.

anula commented

+1 to this one.

In addition, the information from the author field seems to be completely gone - at least I can't find it anywhere. This seems like a bug given that data is lost.

This is also not reflected in the documentation: https://feedparser.readthedocs.io/en/latest/reference-feed-author.html - no mention of the "owner" field there.

@H4kor I don't think this is correct behavior, and @anula thanks for identifying the documentation deficiency with the current implementation!

Before work is done on a patch, I'd like to get feedback about possible solutions. I don't think that the concept of an "owner" is addressed by any of the core specifications that feedparser already supports. Therefore there are several paths that could be taken:

  1. Append itunes:owner to a list of feed authors
  2. Create a new top-level owners key, which will be a list similar to the authors key
  3. Create a key with an itunes prefix, like feed["itunes_owners"], which will be a list similar to the authors key
  4. Create a nested dictionary specific for itunes, like feed["itunes"]["owners"], which will be a list similar to the authors key

The numbers of the ideas above don't indicate ordering of preference; they exist only for reference in conversation.

Are there additional solutions? Are there any issues that would disqualify any of the possible solutions above? I'm very open to feedback.

zifot commented

Hey, couple of thoughts.

First, I would propose to implement (4) as a start and a minimum. I think dedicated "itunes" key would be useful for any further itunes specific attributes, should you want to expose more. On top of that, it would not break current behavior of those "itunes owners" not landing in .authors.

Also, if you can stomach some more nesting, you could even consider something like feed['extensions]['itunes'] "namespace". That way you would end up with approach that communicates notion of custom sets of attributes even stronger, with clear expectations that there is a number of such sets (if there is "itunes", there are/will be others).

And to complicate things a little bit further (but also, paradoxically, simplify things on the decisions level), feedparser could have a config switch that would turn on / off generalization from those "extensions" into the main attributes. That is, if I say nothing, then those "itunes owners" stay within feed['extensions']['itunes']['owners'']. But if I, say, call .parse(merge_extensions=["itunes"], then feedparser could apply, additionally, something like (1) from your post above.

Did I already mention that those specific merge heuristics could be configurable / extensible by feedparser users? :) Something like:

feedparser.parse(merge_extensions={'itunes': MyMerger()})

Default feedparser merging behavior would then also be implemented as such "merger" (not particularly fond of this naming, but that's my first hot take).