jgm/citeproc

Add support for semantic markup in bibliographies

Opened this issue · 3 comments

For some output formats, it is desirable to not only have a formatted bibliography, but to use semantic markup to identify parts of the bibliography (e.g., title, author, publisher, …). While pandoc supports this for JATS, it is lacking for other output formats like TEI (https://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-bibl.html) or HTML (https://schema.org/CreativeWork).

The old pandoc-citeproc processor supported this via raw content elements as an extension to CSL, which allowed citation styles to specify bits of semantic markup to be added to the output. An example for TEI using the old processor can be found here: https://github.com/frederik-elwert/teicite.

It would be desirable if either citeproc re-implements the CSL extension, or if it (or pandoc) provides an alternative solution for adding semantic markup to bibliographies.

tarleb commented

Could it make sense to somehow preserve the names of macros by wrapping the contents in a span with the macro's name? That would make it easy to post-process the output with a filter.

jgm commented

Can't do this with the current API. At the least, we'd need new methods for CiteprocOutput class, which would create a labeled Span in pandoc output.

I’m not sure preserving the macro names would be the best thing to do, as the macro names are basically up to the style authors? I guess ideally the variable names themselves would be preserved for the different output elements like names, text etc.