common-voice/cv-sentence-extractor

Matching opposite characters

MichaelKohler opened this issue · 0 comments

Definition

Currently with even_symbols we can only make sure that characters appear an even number of times. However, for example, we can't say that if there is a there also needs to be a .

Configuration / Rule

We should allow a character map with these, possibly similar to the replacements rule:

matching_symbols = [
  ["„", "“"]
]

If there is a better name than matching_symbols, let's use that one. Also, if anyone can come up with a better data structure, very happy to have it :)

Implementation

In terms of implementation:

  • This could work similarly to replacements
  • Going through all definitions and check if for each defined mapping that the each mapped characters appear the same amount of times

How to implement this

  • Make sure the rule is defined and explained in the README
  • Provide an example in the README, similar to replacements
  • In rules.rs, implement the type as array and set its default to an empty vector
  • In checker.rs inside the check function implement the logic
  • Add tests at the bottom of the checker.rs file which include at least the test cases below, preferably also tests for any other test cases

Test cases:

  • This is „a quote“ -> valid
  • This is „a quote -> invalid
  • This is „a quote“ and „another one“ -> valid
  • This is „a quote“ and another one“ -> invalid