nzhiltsov/mocassin

Reorganize the indexing process workflow

Closed this issue · 1 comments

It should be executed as follows:
1. Given an arXiv id.
2. Download the paper metadata and the Latex source via ArXiv API.
3. Patch the Latex source with additional package entries (pdfsync, framed 
etc.).
4. Produce the arxmliv representation (XML based) using the 'latexml' tools.
5. Add the XML document into a GATE storage and process it with the GATE 
machinery.
6. Extract structural elements etc.
7. PDF compilation: pdflatex -> the main PDF document; get page numbers for 
structural elements using pdfsync; for each structural element -> patch the 
Latex source file with 'shaded' entries & generate the highlighted PDF document 
(pdflatex).
8. Generate RDF metadata of structural elements and populate the RDF store.

Original issue reported on code.google.com by nikita.z...@gmail.com on 18 Jul 2011 at 1:51

Original comment by nikita.z...@gmail.com on 26 Jul 2011 at 11:31

  • Changed state: Fixed