ubermichael/isetools

Feature: log warnings for redundant tagging

Closed this issue · 4 comments

Redundantly nested tagging should log a warning. For example, <EM>one <EM>two</EM> three</EM> is redundant since the inner EM doesn't add any information (we don't support multiple "levels" of emphasis). The following tags would be affected:

  • BLL (if no intervening R)
  • C
  • CL
  • CW
  • EM
  • FONT (iff @size is the same)
  • FOREIGN (iff @lang is the same)
  • I
  • J
  • LD
  • LS
  • PN
  • R (if no intervening BLL)
  • RA
  • RT
  • SC
  • SD
  • SIG
  • SP
  • TITLE
  • WORK

Of course, the "crossing" errors described in #4 would also apply to self-nesting.

I think that's already done with the nesting validator, although it's not smart enough to handle the exceptions you've noted.

https://github.com/ubermichael/isetools/blob/master/src/main/java/ca/nines/ise/validator/NestingValidator.java#L79

Yes, but the warning message should be that the tagging is redundant (and can be corrected) as opposed to incorrect.

Also, if #4 is implemented, then this will be more relevant :)

I'm not convinced that this should be automatically fixable.

For example, this

<SD t="entrance">Enter Bob, Ernie<SD t="exit">Exit Carmel</SD></SD>

is clearly an editor error.

That's true...

Some other examples:

<SD t="entrance">Enter Bob, <SD t="entrance">Ernie</SD></SD>

This one isn't so clear. The editor might be trying to mark the characters separately (especially if @who is used), or might just have redundant tagging.

<SD t="entrance">Enter Bob, <SD t="action">running</SD></SD>

I've seen uses like this one before. It makes perfect sense and shouldn't be an error.

Obviously stage directions in particular will need some more careful thought. I don't think any of the other tags listed in the OP have similar issues though.