Anagram exercise: unicode and case sensitivity
tadas-s opened this issue · 3 comments
Hello,
Not sure if it's right place to ask.
I'm a little confused about this test case:
static void test_unicode_anagrams(void)
{
TEST_IGNORE(); // This is an extra credit test. Delete this line to accept the challenge
// These words don't make sense, they're just greek letters cobbled together.
char inputs[][MAX_STR_LEN] = {
"ΒΓΑ",
"ΒΓΔ",
"γβα"
};
char subject[] = { "ΑΒΓ" };
candidates = build_candidates(*inputs, sizeof(inputs) / MAX_STR_LEN);
enum anagram_status expected[] = { IS_ANAGRAM, NOT_ANAGRAM, NOT_ANAGRAM };
find_anagrams(subject, &candidates);
assert_correct_anagrams(&candidates, expected);
}
Third candidate "γβα", according to the test suite, is not an anagram of "ΑΒΓ". But, if I uppercase the candidate it's "ΓΒΑ". It's also not a case of visually similar characters - try this in your browser console:
> "αβγ".toUpperCase() == "ΑΒΓ"
> true
According some other tests anagram code should ignore the letter case.
Am I missing something? Or is utf8 capable solution is not expected to be case insensitive?
Cheers,
Tadas
Hmm, that test case was added with the original commit, I think before anybody currently active joined - b9d352f
It's not listed as a test at all in the problem-specifications, so this is homegrown and probably unique to this track: https://github.com/exercism/problem-specifications/blob/main/exercises/anagram/canonical-data.json
That is probably an oversight. In fact, the example code doesn't do this correctly at all, since it looks at the individual char
s and not the Unicode encoding. It's looking at anagrams of the individual bytes in the string which I highly doubt is valid.
I may not have context so I'd like another maintainer to weigh in, but if we can write this example with the std library without too much difficulty I say we try to fix it. Otherwise I'm happy just removing it. I haven't written utf8 compatible C code before, so I'm not sure what facilities exist.
@patricksjackson I agree. If this is do-able without contortions with the standard library then we should fix it, otherwise we should just remove it.
The non-ASCII cases were removed from the specification in exercism/problem-specifications#414 due to exercism/problem-specifications#413.
Additionally I think this is not possible with the standard library, e.g. tolower()
only works on single char characters.
To whit, I suggest we likewise remove the tests here.
Will prepare a PR for this now.
Thanks for reporting @tadas-s