jgm/pandoc-types

Single constructor data types and JSON serialization

boisgera opened this issue · 4 comments

Hi everyone !

Until recently I worked under the assumption that a pandoc-types datatype with a single constructor (say Format) had its type erased from the JSON representation : instead of {"t": type, "c": content}, the representation was simply content.

I think that (maybe with the exception of Meta ?) this assumption was valid until the recent changes to the document model. Now, AFAICT some types with a single constructor have their types erased and some don't. I thought for a moment that the difference was that some where declared with newtype keyword (type erasure) and some with data keyword (no type erasure) which would make sense (if I understand correctly the difference between the two keywords in Haskell) but this second hypothesis doesn't hold either.

Could anyone explain me if there is a simple rule based on the definition of pandoc types that says if the type of the data will be erased in JSON representation ?

AFAICT, Format data (newtype) has its type erased in JSON, but RowSpan data (newtype) has its type serialized. Cell data (data) also have their types serialized. Unfortunately, I don't know enough of Haskell to pinpoint what parts of the code explain the difference between these cases ...

The context: I have developped a Python library (https://github.com/boisgera/pandoc) that reads the pandoc-types data models (for as many versions of pandoc as possible) to reproduce automatically the equivalent hierarchy of classes in Python, so that json data can be exchanged with the available pandoc executable to work with a pandoc document representation in Python. The target being the people (first and foremost : me 😉 ) that need to analyze and transform a document with a nice AST and are fluent in Python but not so much in Haskell (or in Lua). To continue to do that, I need to be able to infer automatically from the output of :browse Text.Pandoc.Definition in ghci the JSON serialization rule for each data type. This is why a simple and mechanical rule would help !

Cheers,

SB

jgm commented

I think we should be as consistent as possible here. Maybe @despresc can comment on whether there was a reason for the different behavior in the case of RowSpan and Cell. If not, we should change this.

No, there was no particular reason that I can remember. I think I just missed that newtype-defined types were serialized without their type information when I was looking at the other instance definitions.

Or other single-constructor data types, for that matter. I think I just copied what Inline and Block did for serialization. That means that TableHead, Caption, and similar types should all be changed too.

jgm commented

This will mean another version bump in pandoc-types, but I think it's probably worth making these changes so that we have a consistent JSON serialization scheme.