suin/php-rss-writer

Trips up some parsers

aleemb opened this issue · 3 comments

There are a few issues I ran into, possibly related to the use of htmlentities() as opposed to using htmlspecialchars() as outlined in http://stackoverflow.com/questions/2822774/php-is-htmlentities-sufficient-for-creating-xml-safe-values

I believe the references to htmlentities should be replaced with htmlspecialchars.

Can you post examples?

I did a bit of code gymnastics a while back so I don't remember the exact error condition but it possibly had to do with using an ndash and single-quote in the same string:

htmlentities("–'s"); // –
htmlspecialchars("–'s"); // –

As per pre-defined XML entities:

&lt;    <   less than
&gt;    >   greater than
&amp;   &   ampersand 
&apos;  '   apostrophe
&quot;  "   quotation mark

Anything else need not be html encoded.

In my case it was an iOS application that was parsing the XML and showing the single-quotes as &#039; instead of an actual single-quote, or something along those lines. This was happening because I was escaping the string manually as well htmlentities. So it could very well be that the string was being double-encoded. Either way, the issue is gone now since I am escaping the string using htmlspecialchars but I noticed you are still using htmlentites in your code.

To add to the confusion, I think there is an additional better practise, which is to write:

// better since this will not result in double encoding but still encodes once
htmlentities(html_entity_decode($foo));

// can possible result in double encoding if use already encoded $foo
htmlentities($foo);

The first call is idempotent, the second is not.

suin commented

As php-rss-writer doesn't use htmlentities, I couldn't understand what issue exists. I close this issue but you can feel free to reopen this issue.