lwindolf/liferea

feed parsing is broken

rich-coe opened this issue · 3 comments

I'm getting errors on a number of my feeds.
There's no indication on the UI on what's wrong or how to fix it.
At one time I had made a change to show the parser output when parsing failed, on the UI.
I have --debug-parsing enabled and I get the following output:

18:13:32 PARSING: SAX parser error : Extra content at the end of the document
18:13:32 PARSING: xml_parse_feed(): could not parse feed "New Subscription"!

Add error information to the user page when an error occurs and make it useful.

If possible, be more tolerant of minor issues like control-chars and 8-bit chars in xml data.

v1.15.0 103cb6a

i've debug some of this.
if a subscription has a script, there's appears to be extra chars being added to the end of the XML data.
this upsets the XML parser, resulting in the 'Extra content at the end of document'.
This happens in a simple script that just copies stdin to stdout.
I observed in the input of 70161 chars that the new length after the script runs is 70170.
this implies to me that the fread() is reporting or reading 8 extra bytes from the output pipe.

@lwindolf I may have found the source of this issue.
in src/net.c receiving the body of the http request:

job->result->data = g_memdup2 (g_bytes_get_data (body, &job->result->size), g_bytes_get_size (body));

alloc's a block of memory that may or may not be null terminated, after the body content.
In fact I would assert that it is almost always not null terminated.
Everywhere else after this point, the length of the data is determined by calling strlen.

I'll let it run a while before I declare victory, and then submit a PR.

Fix merged and will be included in 1.15.2