common-voice/cv-sentence-extractor

Documenting the order of execution for language rules

bact opened this issue ยท 5 comments

bact commented

Currently, it is not obvious on the order of rules. I cannot find any mention in the document.

It will be very useful for people who develop the rules to know the order.

For example, to know if replacements will be executed before disallowed_words or broken_whitespace or not.

Apart from replacements, what other rules would be useful to know the order of? I'd argue that apart from replacements it shouldn't matter.

Definitely agree that replacements should mention that this runs before any other rules.

bact commented

If it's the case that replacements run before any other rules, I think you are correct - the order probably doesn't matter for the rest.

I was worried about length-related rules (like min_trimmed_length, min_word_count), but it's clear now if replacements runs the first.

Definitely a good point, added!

bact commented

Yeah, I feels like replacements is actually different from other rules. It is an action. While others are constraints.

I definitely shouldn't mark issues as "good first issue" and then do them myself, sorry! :)

feels like replacements is actually different from other rules. It is an action. While others are constraints.

Can't disagree with you there, but moving it out of the rules files makes it way more complicated than it needs to be :)