ropensci-archive/trackmd

So, pandoc does support track changes via HTML tags

Closed this issue · 2 comments

I just found out that pandoc can read track changes in markdown documents that are encoded with HTML tags:

Hello <span class="deletion" author="Zachary Foster"
date="2018-05-21T18:39:00Z">W</span>orld<span class="insertion"
author="Zachary Foster" date="2018-05-21T10:40:00Z">!!!!</span>
pandoc -s word_test.docx -o test.md --track-changes=all
pandoc -s  test.md -o repre.docx --track-changes=all

This can be transformed into a .docx and the .docx back to rmarkdown.

So the good news is that the markdown->word->markdown process is possible (kindof)!

Bad news is that we are using the Critical markdown syntax instead of the HTML tags used by pandoc. Also, pandoc does not seem to preserve comments. Also, the HTML tags supported by pandoc are a bit too verbose to be typed by hand; not a problem for our shiny app however.

@sctyner, What do you think of adding an option to the shiny app to output the pandoc HTML diffs instead of the Critical markdown diffs so the user can decide which to use?

Sorry for intruding into this project, but I am very interested in completing the rmd->word->rmd cycle of collaboration and would be happy to help although I am not at the unconference.

The pandoc documentation suggests that --track-changes=all should preserve comments in spans:

Both accept and reject ignore comments. all puts in insertions, deletions, and comments, wrapped in spans with insertion, deletion, comment-start, and comment-end classes, respectively. The author and time of change is included.

No doubt it would be possible to convert HTML tags to critical markdown. However, author and date information will be lost.

Hello @ekatko1, thanks for the offer! We would be happy to have some help. This project is still very immature, and we have limited extra time to devote to it so any help would be appreciated.

The pandoc documentation suggests that --track-changes=all should preserve comments in spans

Yea, I experimented a bit with that. It looks like a way forward. In my tests, comments were not preserved. I was thinking of supporting both types of track changes markup, critic markup and what I have been referring to as pandoc markup (I dont know if there is some convention pandoc is following in particular). We could keep the author and date info if we inserted pandoc markup directly.