ungerik/go-rss

XML Parser does not takes xml namespaces into account

zeisss opened this issue · 3 comments

Feed: http://feeds.5by5.tv/changelog

Here, the go-rss package fails to parse the link tag of the channel items. It is always empty.

package main

import (
  "fmt"
  rss "github.com/ungerik/go-rss"
)

func main() {
  feed, _ := rss.Read("http://feeds.5by5.tv/changelog")
  fmt.Printf("%v\n", feed.Item[0].Link) // This is empty!
}

I guess the xml parser gets confused because there are two link tags in that stream (snippet):

...
<item>
...
<link>http://5by5.tv/changelog/107</link>
<atom:link rel="payment" type="text/html" href="https://flattr.com/submit/auto?url=http%3A%2F%2F5by5.tv%2Fchangelog%2F107&user_id=danbenjamin"/>
</item>
...

Any idea how this could be fixed?

I have added an AtomLink struct field to parse that.

This breaks serializing using this library since link and atom:link conflicts according the the xml package
rss.Item field "Link" with tag "link" conflicts with field "AtomLink" with tag "http://www.w3.org/2005/Atom/ link"

A quick googling didn't show an obvious solution to this, and my first idea to add a
Text string "xml:",chardata" in AtomLink wouldn't help since it looks like both link and atom:link is valid. Don't know how to solve this, so since I'm the only one with this problem for now I use a pre (this commit) fork since it works for my usecase.

Well it was mentioned on the go-nuts list, looks like it's an bug in the xml package (http://grokbase.com/t/gg/golang-nuts/136pra4gax/go-nuts-name-conflict-while-decoding-with-package-encoding-xml).