commercialhaskell/stackage

Unexpected additional characters in Unicode output with GHC 9.0.1 / nightly

Closed this issue · 6 comments

orome commented

I'm seeing some strange Unicode behavior in my Haskell package when it builds under GHC 9.0.1. I understand that solving this may involve checking for changes in other Haskell packages, but my question here is whether the unexpected output I'm seeing rings any Unicode bells (Haskell or otherwise), so that I can begin to track down the reasons for the unexpected output. Perhaps there's a known issue with a dependency that affects Unicode output? Something with the testing package dependencies?

Where I expect to see, respectively

  • β (or \946) and
  • γ (or \947)

I instead see

  • β?KQHTLXOCBJSPDZRAMEWNIUYGV and
  • γ?EYJVCNIXWPBQMDRTAKZGFUHOS

This output also has some frustrating properties that make it hard to sort out what's going on:

  1. The garbage letters following the greek character, though always the same on my local machine, are not the same as those I see on builds on other platforms (e.g. on Travis CI Focal I get β?SOVPZJAYQUIRHXLNFTGKDCMB)
  2. What I see and what I get when I paste what I see are different. Typicaly the leading and trailing garbage characters are truncated. So I assume the ? is actually some special character.

Critically, none of this was happening with pre GHC 9 nightly resolvers.

Do the unexpected patterns of characters following the greek characters correspond to anything that would help track down the source of my error? Is there something about how GHC 9 or the packages in the latest nightly Stackage resolvers are handling Unicode that could be causing this?


To replicate:

stack update
stack unpack crypto-enigma-0.1.1.6
cd crypto-enigma-0.1.1.6
rm -f stack.yaml && stack init --resolver nightly
stack build --resolver nightly --haddock --test --bench --no-run-benchmarks

If you look at the snapshot diff, did any of your dependencies get upgraded and look suspicious? https://www.stackage.org/diff/nightly-2021-06-14/nightly-2021-06-20 => a dependency may have changed

Can you reproduce this on GHC 8.10 with the same dependencies as in nightly-2021-06-20? => might be a change in GHC

Can you reproduce it with cabal-install? => might be an issue with stack

orome commented

@bergmark It doesn't look like dependences have changed, and it seems to work on GHC 8.10 with the same dependences as nightly (I think; still working on it) but it looks like something really weird is going on with the use of a unicode character as a key.

I have

type Name = String
type Wiring = Mapping
type Turnovers = String

data Component = Component {
        name :: !Name,              -- ^ The component's 'Name'.
        wiring :: !Wiring,          -- ^ The component's 'Wiring'.
        turnovers :: !Turnovers     -- ^ The component's 'Turnovers'.
}

-- Definitions of rotor Components; people died for this information
rots_ :: M.Map Name Component
rots_ = M.fromList $ (name &&& id) <$> [
        -- rotors
        Component "I"    "EKMFLGDQVZNTOWYHXUSPAIBRCJ" "Q",
        Component "II"   "AJDKSIRUXBLHWTMCQGZNPYFVOE" "E",
        Component "III"  "BDFHJLCPRTXVZNYEIWGAKMUSQO" "V",
        Component "IV"   "ESOVPZJAYQUIRHXLNFTGKDCMWB" "J",
        Component "V"    "VZBRGITYUPSDNHLXAWMJQOFECK" "Z",
        Component "VI"   "JPGVOUMFYQBENHZRDKASXLICTW" "ZM",
        Component "VII"  "NZJHGRCXMYSWBOUFAIVLPEKQDT" "ZM",
        Component "VIII" "FKQHTLXOCBJSPDZRAMEWNIUYGV" "ZM",
        Component "β"    "LEYJVCNIXWPBQMDRTAKZGFUHOS" "",
        Component "γ"    "FSOKANUERHMBTIYCWLQPZXVGJD" ""]

and

rotors :: [Name]
rotors = M.keys rots_

and somehow — only since GHC 9 — when the name for a Component is a Greek character keys, rather than returning just the Greek character, also picks up other text. What that text is varies by context. On my local machine, it is always the wiring for the previous Component in rots_ (which is more than weird enough!), but on Travis CI β appends the wring for IV and γ appends just an X.

If I had to guess, this suggests that there is something going on with respect to how Unicode is actually stored by the compiler that's causing M.keys applied to a Component to pick us something nearby that shouldn't actually be part of keys (or name).

This one really has me stumped and is way above my Haskel skill level. Any help is much appreciated.

Always a good sign when the issue name contains "Mysterious"!

orome commented

Always a good sign when the issue name contains "Mysterious"!

Yeah. That's certainly how it's felt here!

Looks like this will be fixed in GHC 9.0.2, we'll upgrade nightly when it arrives