phseiff/gender-render

specification proposal: global capitalization system

Opened this issue · 7 comments

As it is now, gender*render does not atone to the fact that the capitalization of words depends on their context. For example,

I ate. {they} didn't join me in doing so.

would (for a person with they/them pronouns) become

I ate. they didn't join me in doing so.

On the other hand the "They" in

I ate. {They} didn't join me in doing so.

would not even be recognized as a tag, because it uses the wrong capitalization.
The specification simply doesn't take capitalization into account (yet).

However, the implementation of noun gendering makes nouns whose first letter was uppercase lowercase before gendering them, and then makes the first letter of their gendered version upppercase again before returning it, therefore making actor an actress and Actor an Actress, if the person uses female noun gendering. This behavior, however, is not specified by the specification, though it is arguable implied by it, since it requires correct gendering of valid nouns, and nouns with an uppercase first letter are valid nouns and would not be gendered correctly if they lost their capitalization in the progress.

I feel like (a) the behavior of nouns should not be a poorly documented implementation feature not even explicitly mentioned by the specification, and (b) their behavior should be extended to every type of context value, since every tag can be the first one of a sentence and therefore require capitalization.

My concept for implementing this is as follows:

  • Add an extension spec for it, or add it to the main spec (opinions on this?).
  • Every context value can be written with an uppercase first character, in which case it is converted to lowercase during parsing and the information about its case is stored with the tag (this should be implementation-specific, but my way of going about this would be to add a new section called capitalization which has 0 or 1 (and possible different numbers if more "capitalization types" where added to later versions of the specification) as a value).
  • When rendering a template, the capitalization of a tag would be re-applied to the value the tag is resolved to.

Suggestions, Comments and Opinions are welcome.

This is the next thing I'll start working on.

I think potential values for the capitalization section could be

name example comment
lower-case foobar these first three options are self-explanatory.
capitalized Foobar problem: this only allows to capitalize the first word of multi-word gendered nominatives, which is a problem when porting to languages where not only the first letter of every sentence, but also every noun is capitalized
all-caps FOOBAR
studly-caps FoObAR
alt-studly-caps fOoBaR
random-studly-caps randomly either FoObAR or fOoBaR problem: how should one short-cut this by capitalizing context-values with it?
random-case FoOBar every letter randomly capitalized - this could be the fallback-value, but this would risk having it happen pretty often on accident due to a typo

I think random-studly-caps should be left out for now due to the problem it entails (as noted in the "comment"-column).
random-case will not be implemented for now as well, but the specification will mention that it might be used as a fallback-value by future versions of the spec under specific circumstances, so the behavior of raising an error if the capitalization doesn't match any of the specified options should not be relied on.

So for now, we'll leave it at specifying and implementing th following:

name example comment
lower-case foobar
capitalized Foobar problem: this only allows to capitalize the first word of multi-word gendered nominatives, which is a problem when porting to languages where not only the first letter of every sentence, but also every noun is capitalized
all-caps FOOBAR
studly-caps FoObAR
alt-studly-caps fOoBaR

The whole thing is developed in the specify-global-capitalization-system-branch.

The specification is extended (not yet published as a new version) to support that now, and the implementation already implements it, too; the only thing left to do is testing.

I will probably change this in a later version of the specification to start with the first capitalizeable letter rather than the first letter, so "_Foo" is interpreted as capitalized (rather than simply not allowing to convey this capitalization type), and ".bloop" is rendered to ".BlOoP" when studly-caps is applied to it, rather than ".bLoOp".

Let me know what you think about that if you somehow stumble across this note!