parse "http://main_test.geekpark.net/rss.rss" failed

Question

parse "http://main_test.geekpark.net/rss.rss" failed

allentown521 opened this issue 4 years ago · 14 comments

Caused by: org.jdom2.input.JDOMParseException: Error on line 116: At line 116, column 598: not well-formed (invalid token);

but i try other rss client not use rome is normal

Answer 1 · 2020-11-25T07:59:59.000Z

极客公园 Tue, 24 Nov 2020 08:46:03 +0800 <title>

It seems that the problem is that there is a Chinese exclamation mark （！） in the title tag, but can this problem be solved? The inoreader service is normal, indicating that he also handled this situation

Answer 2 · 2020-11-30T16:44:33.000Z

I opened this feed in both Chrome and Firefox and both tell me there is an error with the UTF-8 in the feed. I don't think this is a problem with Rome, inoreader may well be silent about UTF-8 issues.

Answer 3 · 2020-12-01T01:05:09.000Z

If the problem is identified and there are ways to circumvent it, why not try to solve it? The compatibility of rome should be getting better, right? François Schiettecatte <notifications@github.com> 于2020年12月1日周二上午12:44写道：

…

I opened this feed in both Chrome and Firefox and both tell me there is an error with the UTF-8 in the feed. I don't think this is a problem with Rome, inoreader may well be silent about UTF-8 issues. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#460 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AQTPNMZAUH64BP536V2S3OTSSPDYDANCNFSM4UCA3CSQ> .

Answer 4 · 2020-12-01T01:18:06.000Z

You can use String(byte[] bytes, Charset charset) to convert the feed from bytes to a string before passing it to Rome.

Answer 5 · 2020-12-01T02:30:19.000Z

hi，i only see SyndFeedInput.build(Reader/File/InputSource/Document), my code is : val romeFeed = input.build(XmlReader(response.body()?.byteStream())) so may i should convert byte to String ,and then to Document, and then pass to build method? François Schiettecatte <notifications@github.com> 于2020年12月1日周二上午9:18写道：

…

You can use String(byte[] bytes, Charset charset) <https://docs.oracle.com/javase/8/docs/api/java/lang/String.html#String-byte:A-java.nio.charset.Charset-> to convert the feed from bytes to a string before passing it to Rome. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#460 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AQTPNM47NZOZK25SXDX2J5DSSQ75VANCNFSM4UCA3CSQ> .

Answer 6 · 2020-12-01T03:19:51.000Z

You can wrap the string in a StringReader(), as follows:

SyndFeedInput syndFeedInput = new SyndFeedInput();
syndFeed = syndFeedInput.build(new StringReader(feedString))

I don't know how you are getting the url, but you want to read bytes.

Answer 7 · 2020-12-01T18:11:39.000Z

hi,seems it is useless: i change code to val romeFeed = input.build(StringReader(String(response.body()?.bytes()!!, Charset.forName("UTF-8")))) exception here: Caused by: org.jdom2.input.JDOMParseException: Error on line 754: At line 754, column 598: not well-formed (invalid token) at org.jdom2.input.sax.SAXBuilderEngine.build(SAXBuilderEngine.java:232) at org.jdom2.input.sax.SAXBuilderEngine.build(SAXBuilderEngine.java:303) at org.jdom2.input.SAXBuilder.build(SAXBuilder.java:1196) at com.rometools.rome.io.WireFeedInput.build(WireFeedInput.java:233) at com.rometools.rome.io.SyndFeedInput.build(SyndFeedInput.java:150) Caused by: org.apache.harmony.xml.ExpatParser$ParseException: At line 754, column 598: not well-formed (invalid token) at org.apache.harmony.xml.ExpatParser.parseFragment(ExpatParser.java:509) at org.apache.harmony.xml.ExpatParser.parseDocument(ExpatParser.java:494) at org.apache.harmony.xml.ExpatReader.parse(ExpatReader.java:315) at org.apache.harmony.xml.ExpatReader.parse(ExpatReader.java:273) at org.jdom2.input.sax.SAXBuilderEngine.build(SAXBuilderEngine.java:217) at org.jdom2.input.sax.SAXBuilderEngine.build(SAXBuilderEngine.java:303) at org.jdom2.input.SAXBuilder.build(SAXBuilder.java:1196) at com.rometools.rome.io.WireFeedInput.build(WireFeedInput.java:233) at com.rometools.rome.io.SyndFeedInput.build(SyndFeedInput.java:150) by the way , i download the latest version of chrome(87),It is also processed and can be displayed normally François Schiettecatte <notifications@github.com> 于2020年12月1日周二上午11:20写道：

…

You can wrap the string in a StringReader(), as follows: SyndFeedInput syndFeedInput = new SyndFeedInput(); syndFeed = syndFeedInput.build(new StringReader(feedString)) I don't know how you are getting the url, but you want to read bytes. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#460 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AQTPNM5FZGARZJMNPM6MGEDSSROGHANCNFSM4UCA3CSQ> .

Answer 8 · 2020-12-01T18:45:48.000Z

I just tried this code and it works for me now:

import com.rometools.rome.feed.synd.SyndFeed;
import com.rometools.rome.io.SyndFeedInput;
import com.rometools.rome.io.XmlReader;

final URL url = new URL("http://main_test.geekpark.net/rss.rss");
final SyndFeedInput syndFeedInput = new SyndFeedInput();
final SyndFeed syndFeed = syndFeedInput.build(new XmlReader(url));

And the feed XML does not report an error in either Chrome or Firefox, so it was correct when I tested it. But it was not correct when I tested it before and what I noticed was the error was being reported in difference parts of the feed when I reloaded it which suggests that the bad content was being dynamically generated.

Answer 9 · 2020-12-02T01:16:36.000Z

Indeed, the problematic content has been covered by the new ones and cannot be tested now. Can you use unit tests to test? After all, the problem is relatively clear. thanks! François Schiettecatte <notifications@github.com> 于2020年12月2日周三上午2:46写道：

…

I just tried this code and it works for me now: import com.rometools.rome.feed.synd.SyndFeed; import com.rometools.rome.io.SyndFeedInput; import com.rometools.rome.io.XmlReader; final URL url = new URL("http://main_test.geekpark.net/rss.rss"); final SyndFeedInput syndFeedInput = new SyndFeedInput(); final SyndFeed syndFeed = syndFeedInput.build(new XmlReader(url)); And the feed XML does not report an error in either Chrome or Firefox, so it was correct when I tested it. But it was not correct when I tested it before and what I noticed was the error was being reported in difference parts of the feed when I reloaded it which suggests that the bad content was being dynamically generated. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#460 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AQTPNMZO6WRNE5MSUQMCWR3SSU2WVANCNFSM4UCA3CSQ> .

Answer 10 · 2020-12-02T02:09:27.000Z

Some test cases would be great, and did you try this constructor for XmlReader:

public XmlReader(final InputStream is, final boolean lenient)

so:

final SyndFeed syndFeed = syndFeedInput.build(new XmlReader(url.openStream(), true));

Answer 11 · 2020-12-02T02:49:02.000Z

i never used it before , always use okhttp instead, Do you think it will make a difference？ François Schiettecatte <notifications@github.com> 于2020年12月2日周三上午10:09写道：

…

Some test cases would be great, and did you try this constructor for XmlReader: public XmlReader(final InputStream is, final boolean lenient) so: final SyndFeed syndFeed = syndFeedInput.build(new XmlReader(url.openStream(), true)); — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#460 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AQTPNM5NZAWCQL3VNW32ENTSSWOWNANCNFSM4UCA3CSQ> .

Answer 12 · 2020-12-02T15:19:33.000Z

I wasnt talking about a test case, I was wondering if you had tried the 'lenient' flag for XmlReader(), actually I don't know what your original code looked like.

Answer 13 · 2020-12-03T01:27:03.000Z

i use XmlReader(final InputStream is), the lenient is true by default ； code here: fun parseFeedResponse(feedUrl: String): SyndFeed { try { createCall(feedUrl).execute().use { response -> val input = SyndFeedInput() return input.build(XmlReader(response.body()?.byteStream())) } } catch (t: Throwable) { throw t } } François Schiettecatte <notifications@github.com> 于2020年12月2日周三下午11:19写道：

…

I wasnt talking about a test case, I was wondering if you had tried the 'lenient' flag for XmlReader(), actually I don't know what your original code looked like. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#460 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AQTPNMZMZQBP3XFXQDC7H4TSSZLJLANCNFSM4UCA3CSQ> .

Answer 14 · 2020-12-03T16:32:18.000Z

Ok, well problem is that since the bad content has rolled off the feed (I just checked again), and I did not keep a version, I cant really dig anymore into this. However happy to help if the bad content reappears. I would suggest you close this issue.