Anagram: 'non-ascii' character doesn't match with problem description

Question

Anagram: 'non-ascii' character doesn't match with problem description

bupjae opened this issue 2 years ago · 5 comments

It seems that recent pull request (#2445, #1451) add non-ascii test case.

However, problem description says The target and candidates are words of one or more ASCII alphabetic characters (A-Z and a-z).

I'm not sure whether 'word may be contain non-ascii characters' is enough. For example, should we handle non-ascii punctations, i.e., 、 or 。?

Answer 1 · 2022-09-19T18:25:12.000Z

Good catch! We should add/update the instructions.append.md file to clarify that the Go version of the exercise is a bit different and contains non ASCII letters so people can practice how Go handles those.

See https://exercism.org/docs/building/tracks/practice-exercises#h-file-docs-instructions-append-md for the general documentation on the append file.

Be aware that the file needs to have a h1 header at the top (#) that will be completely removed when the content is put on the website. That means the content from the append appears directly below the default instructions. So usually it is good to put some h2 (##) heading in before the actual text. We often use "Implementation Notes" for example.

Answer 2 · 2022-09-21T19:36:18.000Z

I don't think we need to say anything about punctuations. They don't appear in the test cases and whether people decide to add extra code to remove potential punctuations or treat them as letters therefore does not matter at all for passing the tests.

Answer 3 · 2022-09-21T22:56:05.000Z

One existing test case and newly added test case contains space " " (U+0020). I'm not sure whether these spaces should be ignored or not.

Newly added test case contains punctuations; "，" (U+FF0C) and "," (U+002C). It seems that the test case is suggesting that these characters should be treated like normal character.

Answer 4 · 2022-09-22T15:27:11.000Z

DISCLAIMER: I do not read or write Chinese, so my view may be flawed.

The Anagram exercise instructions and cases_test.go suggests a definition, which is limited to rearranging letters of a word to create a new word.

The new test case in 'anagram_test.go' uses the phrase "你好，世界" meaning "Hello world," and expects a new phrase as the solution.

I propose limiting the nonAsciiTestCases to words and not allowing phrases.

If we agree, we can research some new non-ASCII anagrams.

While searching for non-ASCII anagrams, I found many Exercism GitHub issues in various programming languages discussing the topic. The consensus is that we should not be testing for non-ASCII characters unless it is a core learning outcome of the exercise.

exercism/problem-specifications#428

I am supportive of including non-ASCII aspects where it enhances the learning of golang features or educates on the issues of assuming every character is a byte.

The Append File will still need to be updated accordingly, depending on our direction.

Answer 5 · 2022-09-22T19:37:07.000Z

@bupjae Sorry, I totally misremembered the new test case. I was sure it was only "letters". 🙈 Thanks for it out (again).

@W8CYE Yes, please change to use only "letters".
I am aware of the overall discussion and often times I am on side of keeping the exercise as easy as possible. However, in this case the point is to demonstrate to that you actually do not have to change anything to make it work for non-ASCII characters in most solution approaches because of how strings work in Go. I find this quite nice. To back up this intuition we had here, I had a look at the community solutions and most solutions written before the new test was added still pass with the new test case.

So we are in the range of

unless it is a core learning outcome of the exercise

Random thought: Maybe we should say something along the lines of "the Go version will test your code with non-ASCII characters but that should not bother you too much" in the append to indicate students should not freak out because of the non-ASCII requirement. Not sure how to phrase this though.