purescript-deprecated/purescript-strongcheck

duplicate values in generated data

Closed this issue · 6 comments

So I spied on the generated data in a test and got something like this:

checkThing :: [[String]] -> Result
[]
[]
[[]]
[["𣄽","𣄢𣄣𣄤"],["𣄥𣄦"]]
[]
[["𣄥"]]
[[""]]
[]
[[]]

I'm just noting that [] and [[]] seem to occur frequently, and there is no value to repeats in the generated tests. It would be really nice to see this work in such a way that duplicates do not appear.

I also don't understand why Chinese characters seem to appear almost exclusively, it would be nice to have more variety in the strings that get created. For example, common cases along side wonky ones. Here is an example that I feel is closer to ideal.

[]
[["𣄽"]]
[["𣄥𣄥","","Hi there Mr. WilS0N"],["☺☺☺☺"]]
[[],[]]
[[]]
[["....","W"],["","ghs slikja werij"],[""]]
[[""]]
[["123.345.7000","++!!","     "],["﷽"]]

It would be nice if arbitrary instance for things like strings took into account known problematic characters:
https://bug279099.bugzilla.mozilla.org/attachment.cgi?id=173729

I think it should be possible to split up unicode into meaningful sets and leverage that so some tests are exclusively from one set, and others are mixed.

That does seem an awfully lot of empties showing up. I will look into that.

If you want readable text, you can always use one of the newtypes, such as AlphaNum. I chose the default arbitrary String to be unicode because I feel if you use String, your code should be robust enough to handle all possible strings.

I also agree with the using problematic characters -- can you open a separate ticket for that? Thanks!

Ok, issue #6 created. I get that unicode for string defaults, I just would like the algorithm to have a better understanding of unicode sets so that it does not so frequently fall into Chinese. Chinese is a good case and all, but it shouldn't be the only case that gets hit, and right now thats what I'm seeing.

An example usage of newtype for AlphaNum would be good for the docs as well.

Also, its not empties that are an issue for me, its duplicates, any duplicate should not show up, because duplicate tests do not add value.

noDuplicates = nub <$> arbitrary

Edit: sorry, I see, duplicate tests.

I see the same Chinese character repeated, which likely means there's too much determinism somewhere (something's being restarted, e.g., without carrying over the old seed value). Do you have a small example case to reproduce this result?

Just check anything with String its been deterministic for me that I only get empty strings or Chinese.