mrichards42/xword

Empty italic tag in clues causes rest of clue to be italicized

Opened this issue · 3 comments

The following XML in a JPZ clue:

<span>Part of clue<i/>more of the clue</span>

Causes "more of the clue" to be italicized even though it's not actually enclosed in the italics. This also seems to happen with a regular empty tag instead of a self-closing tag, and also if there's whitespace inside the tag; there has to be some non-blank character for the parsing to work correctly, AFAICT.

  • I can no longer reproduce with a regular empty tag, only a self-closing tag, at least with ipuz (was testing with jpz before)
  • It still reproduces if I use wxHtmlWinParser in HtmlClueListBox::CacheItem to parse the HTML, which I think is the component responsible for parsing the HTML before rendering it
  • AFAICT, <i/> is invalid HTML in that only certain tags are permitted to be self-closing. It's also not really the kind of thing you'd generally expect to see, though I did observe it once (not sure whether it was in the original source data or if I introduced it when converting from another format to JPZ). But the failure mode here of just ignoring the closing tag doesn't seem great.

Probably not a huge priority in the grand scheme of things, but I guess the next step here would be to try to reproduce this with a smaller sample app and pass the report along to wxWidgets.

EDIT: I originally posted this without escaping the <i/> above, and, funnily enough, the rest of the comment showed up in italics! Maybe this is actually how HTML parsers are supposed to handle this...

Hmm . . . looking through what I think is the jpz schema, it seems like clue text is actually XML, not a string of html? In which case <i/> would in fact be a self-closing tag :) It looks like the spec allows <i> <b> <span> <sub> and <sup> children in clue text.

So . . . maybe this needs to be handled in the jpz parser? We could convert self-closing tags to the equivalent empty tag, or perhaps just remove them entirely since that should render the same way.

I realized that I filed this as part of investigating the clue mentioned in jpd236/kotwords#24, and indeed that specific clue is still a working repro case where the italic tag is non-empty (and thus not self-closing). So it does seem like there's more to this.

Attached a sample JPZ where the clue for 1-Across is:

<span>First across clue with</span><i> </i><span>italicized space</span>

This renders correctly in Crossword Solver, but in XWord, the space is omitted, and "italicized space" is in italics.

test.zip