plk/biber

Proposal: add JSON as an export format

Closed this issue · 12 comments

Currently, biber in tool mode can convert a bib file to formats such as XML or dot, but not JSON.
Vim and Emacs both have native functions that can read and parse a JSON-formatted string, thus being able to read a JSON file containing bibliographic information
would allow users to easily build rich completion plugins without the need for an additional tool. For example, when completing a bibliographic key, Vim could show a popup window with additional information such as the title and the author of the work being cited; a plugin called vim-pandoc provides such functionality, but it has to use Python to do so.

As an example, a single bibliographic entry could be represented as such:

{
    "type": "article",
    "key": "Foo2022",
    "title": "Bar Baz",
    "author": "Doe, John"
}

and an entire bib file would be a list of entries:

[
    {
        "type": "article",
        "key": "Foo"
    },
    {
        "type": "article",
        "key": "bar"
    }
]

Thank you very much for your work on biber.

plk commented

Do you have a specific JSON standard for bib data in mind? It's not that difficult to add an output option for JSON and I've considered it before but I was never sure how useful that would be if it's not some format that something expected.

@plk thanks for your reply. I have found this standard.

plk commented

This looks interesting. @moewew - I wonder if we might just drop biblatexml as a format and use this instead as it has independent support?

There is also CSL-JSON, which might be interesting for interoperability with CSL styles: https://github.com/citation-style-language/schema#csl-json-schema. Not sure how different that is from bibjson.

Question is if we can map everything Biber does with .bib files (field annotations and friends) to this JSON format.


biblatexml and bblxml (is that the same thing?) don't appear to be used all that often, so I don't have any reservations against removing it, if that makes things easier. (In a Google search I only found references to the implementation and the documentation and no indication that people actually use it.)

plk commented

I'll have a look at CSL-JSON. BibJSON has no fixed schema so we can basically do anything we want but then again having no fixed schema means it's of more limited use.

plk commented

Have a look at this - I have implemented a bibjson output format using a schema based on the biblatexml schema. See how this looks. It will be possible to output a JSON schema for this as an option when outputting to bibjson. I would probably look at implementing CSL-JSON format instead of or in addition to this, depending on what we decide. CSL-JSON is more restricted as it doesn't allow for some things that biber can do, like arbitrary name parts, annotations, etc.

https://gist.github.com/plk/99cdd0379da2c2b53f110bcb6e2a3430

plk commented

Digging around a bit, it looks like bibjson isn't much used and CSL-JSON might be the best choice. It does mean that arbitrary biber .bib files won't work as its schema is fixed but I'm not sure what use a custom schema solution would be anyway as it won't be interoperable with anything.

@plk Thanks, the sample JSON file look great to me. My initial idea for this feature was to simply map each key/value pair in the bibtex file to a key/value pair in the JSON file, and the implementation would look like this (in pseudocode):

import json_module
convert_to_json(internal_variable_holding_bibliographic_data)

This way, the user would know which keys they should expect in the JSON output just by reading biblatex's documentation.
As you point out though, this solution would create an additional, noninteroperable "standard".

Another option might be to adopt the CSL-JSON standard, but the manual should warn the user that the conversion from bibtex to JSON could be "lossy", depending on the input file.

What do you think?

plk commented

I think probably CSL-JSON is a better choice as any custom JSON won't have any parser for its schema without someone writing one which makes the use-case extremely niche and we already have a completely customisable output format with biblatexml which also writes the accompanying schema. The only real use-case I can see for bibjson is that it would replace biblatexml but I'm not convinced about that as you sacrifice some things with JSON compared with XML (attributes vs content is a nice distinction in XML that makes it easier for humans to read and the speed of JSON parsing isn't really much of an issue in batch bibliography operations). I knocked up a bibjson example just because it's quicker as there are no schema requirements.

@plk then I'm ok with biber using CSL-JSON as its JSON output format.

plk commented

I've had a look at this and from experience implementing things like RIS, Endnote and some other JSON output formats in the past (now all deprecated due to lack of use), I'm not sure this is a good idea. The mapping from CSL-JSON to biblatex types and fields is messy and to make any sense at all, it would have to be modifiable by the user and include contextual, dynamic elements. This is not trivial to implement and maintain. Keeping this sort of thing in sync with changes in CSL-JSON data schemata etc. is too much to track for little gain. This sort of "glue" interface between different data models is basically of limited use in general and only works for generic entries. We do have the "biblatexml" output format which could be programatically changed into some sort of JSON output, depending on the use case but a general transformation from the internal data model is just too messy to be of much use to a wide enough audience I suspect. I'm not sure even what use a generic json output format is since this would have to be modified for any reasonable use case and you might as well just parse the biblatexml output.

I see. Thanks for taking the time to discuss this feature request.