jgm/djoths

Double quotation marks don't render properly

Closed this issue · 7 comments

MVE:

parseDoc (ParseOptions NoSourcePos) "Hello \"Djot\" World" <&> renderHtml (RenderOptions False)

Output:

Right "<p>Hello \226\128\156Djot\226\128\157 World</p>\n"

Interestingly if I run djot with cabal run it works fine.

~/projects/djoths > cabal run
Hello "Djot"
<p>Hello “Djot”</p>

I don't think this is a problem with the input string, as the AST looks fine:

λ: parseDoc (ParseOptions NoSourcePos) "Hello \"Djot\" World"
Right (Doc {docBlocks = Many {unMany = fromList [Node NoPos (Attr []) (Para (Many {unMany = fromList [Node NoPos (Attr []) (Str "Hello "),Node NoPos (Attr []) (Quoted DoubleQuotes (Many {unMany = fromList [Node NoPos (Attr []) (Str "Djot")]})),Node NoPos (Attr []) (Str " World")]}))]}, docFootnotes = NoteMap {unNoteMap = fromList []}, docReferences = ReferenceMap {unReferenceMap = fromList []}, docAutoReferences = ReferenceMap {unReferenceMap = fromList []}, docAutoIdentifiers = fromList []})

(You can see that Quoted DoubleQuotes (...) in there)

I'm not clear why one works and the other doesn't though, given that app/Main.hs seems to do basically the same thing.

jgm commented

Why do you say it isn't working? It is producing a bytestring with the UTF-8 encoding of the curly quotes.

Same for an em-dash:

λ: parseDoc (ParseOptions NoSourcePos) "Hello — world" <&> renderHtml (RenderOptions False)
Right "<p>Hello \DC4 world</p>\n"
~/projects/djoths > echo "Hello — world" | cabal run
<p>Hello — world</p>

Ah ok, so I say "isn't working" because it comes out weirdly in my browser which is maybe my error in understanding the usage of renderHtml.

I suppose the problem is that hPutBuilder which is what is used in Main.hs automatically handles the escape sequences and converts them to UTF8, but it's not obvious that one has to do that when using renderHtml.

Anyway, I guess this is really a problem of how I handle the output, so I'll close the issue.

jgm commented

renderHtml gives you a Builder, which you can convert to a lazy bytestring using toLazyByteString from ByteString.Builder.
If you need a Text or a String, you'd need to call another function to convert from this lazy bytestring.

Thanks -- was less a problem of that and more of handling the unicode escape sequences. I realised I misunderstood how this was supposed to work, and it turns out the fix was ... incredibly basic (adding a charset to the HTTP headers made the browser render it ok).