wpoa/JATS-to-Mediawiki

Loose ends on the python script

Klortho opened this issue · 1 comments

The python script seems not to be quite finished. I don't think the tmpdir, infile, or outfile command line options are implemented.

The infile option would be really nice, because it would mean that I could make changes to the XSLT, and the reconvert the same article again without having to download it. Or, I could make changes to the input XML to test things, and reconvert it directly.

I'm not sure about tmpdir. It might be better to have the script always download and extract into a directly called articles.

I do not know python very well, but it seems pretty easy to hack, so maybe I could do this as a learning exercise.

The output option is there as a placeholder for running the script as a system for streaming converted text, to std.out (the default) or to a file. Indeed it is not implemented, and currently the output is saved to a .mw.xml file for simplicity. Doesn't seem like we need to change this, but we could comment it out or remove it to be more clear.

The infile option is implemented and it works! However, the script expects DOIs or PMIDs as inputs, so an infile is a list of DOIs or PMIDs. If you want to reconvert the same article again without having to download it, or to make changes to the XML and reconvert directly, instead you should simply call xsltproc as usual:

xsltproc jats-to-mediawiki.xsl $FILENAME.nxml > $FILENAME.mw.xml

On the other hand, we should definitely change the script to check if the file has already been downloaded, and if so, skip downloading it. That should save network time generally.

As for tmpdir you're right, this was an oversight and causes a bit of a mess. I just fixed it and put everything into an articles directory as you suggest.