rometools/rome

parse "http://main_test.geekpark.net/rss.rss" failed

allentown521 opened this issue · 14 comments

Caused by: org.jdom2.input.JDOMParseException: Error on line 116: At line 116, column 598: not well-formed (invalid token);

but i try other rss client not use rome is normal

极客公园 Tue, 24 Nov 2020 08:46:03 +0800 <title>

It seems that the problem is that there is a Chinese exclamation mark (!) in the title tag, but can this problem be solved? The inoreader service is normal, indicating that he also handled this situation

I opened this feed in both Chrome and Firefox and both tell me there is an error with the UTF-8 in the feed. I don't think this is a problem with Rome, inoreader may well be silent about UTF-8 issues.

You can use String(byte[] bytes, Charset charset) to convert the feed from bytes to a string before passing it to Rome.

You can wrap the string in a StringReader(), as follows:

SyndFeedInput syndFeedInput = new SyndFeedInput();
syndFeed = syndFeedInput.build(new StringReader(feedString))

I don't know how you are getting the url, but you want to read bytes.

I just tried this code and it works for me now:

import com.rometools.rome.feed.synd.SyndFeed;
import com.rometools.rome.io.SyndFeedInput;
import com.rometools.rome.io.XmlReader;

final URL url = new URL("http://main_test.geekpark.net/rss.rss");
final SyndFeedInput syndFeedInput = new SyndFeedInput();
final SyndFeed syndFeed = syndFeedInput.build(new XmlReader(url));

And the feed XML does not report an error in either Chrome or Firefox, so it was correct when I tested it. But it was not correct when I tested it before and what I noticed was the error was being reported in difference parts of the feed when I reloaded it which suggests that the bad content was being dynamically generated.

Some test cases would be great, and did you try this constructor for XmlReader:

public XmlReader(final InputStream is, final boolean lenient)

so:

final SyndFeed syndFeed = syndFeedInput.build(new XmlReader(url.openStream(), true));

I wasnt talking about a test case, I was wondering if you had tried the 'lenient' flag for XmlReader(), actually I don't know what your original code looked like.

Ok, well problem is that since the bad content has rolled off the feed (I just checked again), and I did not keep a version, I cant really dig anymore into this. However happy to help if the bad content reappears. I would suggest you close this issue.