hellux/jotdown

Add an AST

Opened this issue · 5 comments

hellux commented

It is often useful to work with an AST rather than a sequence of events. We could implement an optional module that provides AST objects that correspond to the AST defined by the djot spec (https://github.com/jgm/djot.js/blob/main/src/ast.ts).

It would be useful to be able to create it from events, and create events from the AST so you can e.g. parse events -> create ast -> modify ast -> create events -> render events.

It could also be useful to read/write the AST from/to e.g. json. We may then be able to read/write ASTs identically to the reference implementation. It might also be useful in tests to match against JSON produced by the reference implementation. We should be able to automatically implement the serialization/deserialization using serde, and then the downstream client can use any serde-compatible format.

A quick sketch of what it could look like:

#[cfg(feature = "ast")]
pub mod ast {
    use super::Event;

    use std::collections::HashMap as Map;

    #[cfg(feature = "serde")]
    use serde::{Deserialize, Serialize};

    #[cfg_attr(feature = "serde", derive(Deserialize, Serialize))]
    pub struct Doc {
        children: Vec<Block>,
        references: Map<String, Reference>,
        footnotes: Map<String, Reference>,
    }

    #[cfg_attr(feature = "serde", derive(Deserialize, Serialize))]
    pub struct Reference {
        // todo
    }

    #[cfg_attr(feature = "serde", derive(Deserialize, Serialize))]
    pub struct Footnote {
        // todo
    }

    #[cfg_attr(feature = "serde", derive(Deserialize, Serialize))]
    pub struct Block {
        kind: BlockKind,
        children: Vec<Block>,
    }

    #[cfg_attr(feature = "serde", derive(Deserialize, Serialize))]
    pub enum BlockKind {
        Para,
        Heading { level: usize },
        // todo
    }

    pub struct Iter<'a> {
        // todo
        _s: std::marker::PhantomData<&'a ()>,
    }

    impl<'a> Iterator for Iter<'a> {
        type Item = Event<'a>;

        fn next(&mut self) -> Option<Self::Item> {
            todo!()
        }
    }

    #[derive(Debug)]
    pub enum Error {
        EventNotEnded,
        UnexpectedStart,
        BlockInsideLeaf,
    }

    impl<'s> FromIterator<Event<'s>> for Result<Doc, Error> {
        fn from_iter<I: IntoIterator<Item = Event<'s>>>(events: I) -> Self {
            todo!()
        }
    }

    impl<'a> IntoIterator for &'a Doc {
        type Item = Event<'a>;
        type IntoIter = Iter<'a>;

        fn into_iter(self) -> Self::IntoIter {
            todo!()
        }
    }
}

clientside:

let src = "# heading

para";

let events = jotdown::Parser::new(src);
let ast = events.collect::<Result<jotdown::ast::Doc, _>>().unwrap();
let json = serde_json::to_string(&ast);

assert_eq!(
    json,
    r##"
    {
      "tag": "doc",
      "references": {},
      "footnotes": {},
      "children": [
        {
          "tag": "para",
          "children": [
            {
              "tag": "str",
              "text": "para"
            }
          ]
        }
      ]
    }
    "##
);

I was going to suggest basing such an AST on the output of typify for the json-schema generated from typescript definitions in djot.js, but typify doesn't parse it.

Having an internal AST like this, as well as being able to consume and produce it in JSON form, would allow the use of jotdown as a library to write filters as standalone binaries:

djot -t json mydoc.dj | myrustbinary | djot -f json > index.html
hellux commented

I was going to suggest basing such an AST on the output of typify for the json-schema generated from typescript definitions in djot.js, but typify doesn't parse it.

It would be nice if the AST types could be generated automatically. The only work needed would be to convert between AST and events.

Having an internal AST like this, as well as being able to consume and produce it in JSON form, would allow the use of jotdown as a library to write filters as standalone binaries:

djot -t json mydoc.dj | myrustbinary | djot -f json > index.html

If one wants to manipulate an AST, I guess jotdown (which is mainly a parser) is not really needed here. Just need some AST types that can be serialized and deserialized.

I tried two additional conversion tools:

  1. quicktype, both with typescript and json schema input
  2. typester, which isn't intended to be used in production

None of them parsed (or at least completed), so am wondering if there's something funky about that ast definition?

If one wants to manipulate an AST, I guess jotdown (which is mainly a parser) is not really needed here. Just need some AST types that can be serialized and deserialized.

So in a scenario like this, jotdown would just be able to output the same AST as djot.js, and any filtering would be done independently?

I'm wanting to implement citation and bibliography processing using djot for this project I'm working on, once John adds supports for citations, so just wondering how that might work.

hellux commented

I just posted a linked issue over there.