whatwg/html-build

hard-coded match strings make this project language dependent

Opened this issue · 4 comments

I'm trying to translate whatwg/html here: https://whatwg-cn.github.io/html/multipage/

To sync this fork (especially not-yet-translated sections), the build tools (html-build, watssi) are also used in that repo. While I find out the hard coded match strings (in .pre-process-annotate-attributes.pl, .pre-process-tag-omission.pl, and maybe others) will break the build process, for example:

<dt><span data-x=\"concept-element-attributes\">Content attributes</span>:</dt>

https://github.com/whatwg/html-build/blob/master/.pre-process-annotate-attributes.pl#L18

Now I also translated these perl source files locally. Could there be better solutions to make this tool language in-dependent? Or should I push the zh-Hans version to this project, which may require localization mechanism to be implemented.

It seems like a worthy goal to make this language-independent. My preferred approach would be changing these parts of the script to search based on langauge-independent things, e.g. the data-x value instead of the data-x + text content. A pull request to do that would be welcome.

Agreed. But this only works if there're enough markups or annotations in the html/source firstly, and not to insert any text (this will be language specific) when building. Like these:

# in file .pre-process-tag-omission.pl:
$$line .= "   <dt><span data-x=\"concept-element-tag-omission\">Tag omission in text/html</span>:</dt>\n";
# and:
pushLine($_, "  <p>Neither tag is omissible.</p>\n");

By the way, I don't think it's a good idea to maintain semantic-related parts of source in build scripts. I mean, like .pre-process-tag-ommision.pl. It's feasible as long as this part do not change frequently. If that's OK, I can help.

Hmm, I see.

I think it's important that we not repeat this boilerplate in the source file. Do you have a proposed solution that allows us to keep the source file clean, but still allows easier translation?

We can keep the source untouched by providing a i18n configuration (maybe picked up by a LANG env variable) for the build tools.