w3c/jlreq

Review Segment Break Transformation Rules (CSS Text Level 3)

kidayasuo opened this issue · 6 comments

There are discussions in CSS WG regarding Segment Break Transformation Rules:

We would like to review the rule to see if there are any remaining issues or areas which need discussions.

[updated] Updated the data by removing ones that are actually fullwidth versions of the character, and by removing character classes that are inherently non-Japanese (cl-24-cl-27). It makes the list easier to examine.

List of characters listed in JLReq that are not Space Discarding according to https://drafts.csswg.org/css-text-3/#space-discard-set

NOT_SpaceDiscarding_JLReq_char.txt

xfq commented

There's also w3c/csswg-drafts#5017 , which is the new CSS issue for "ambiguous" characters.

The list is very much helpful, thank you very much, @kidayasuo! It looks to me that the list is reasonable; i.e., the current set of space-discarding unicode characters is reasonable from JLREQ perspective. /cc @fantasai

A basic, but fundamental question. How much we can expect authors or editor software, if they fold line automatically, to corporate? In one extreme, we could say to CJ authors to fold lines only between two Kanjis. then we do not need any other rules than "the segment break transformation rule will not insert a space between two Kanjis". (also, probably these expectations should be documented)

A basic, but fundamental question...

I think that is exactly where this is controversial. I'm in favor of making rules as simple as possible, because no matter what we do, authors must remember all the rules, and adopt to it. @r12a seems to have similar opinion if I understand his comment correctly. I see some people arguing more rules can make it smarter. I agree they help some cases but authors must remember more.

Thank you. I agree we should make the rule easier to remember, in another word intuitive. It also needs to be reliable and in that sense I am not so much fond of language tagging idea because it is more prone to errors.

One little caution is that, in general, things that look simpler for human and easier to remember does not necessarily match something that is simple for rule makers. I think we should strive to devise a "smart" rules that feels simpler to people or our users.