Reorganize the indexing process workflow
Closed this issue · 1 comments
GoogleCodeExporter commented
It should be executed as follows:
1. Given an arXiv id.
2. Download the paper metadata and the Latex source via ArXiv API.
3. Patch the Latex source with additional package entries (pdfsync, framed
etc.).
4. Produce the arxmliv representation (XML based) using the 'latexml' tools.
5. Add the XML document into a GATE storage and process it with the GATE
machinery.
6. Extract structural elements etc.
7. PDF compilation: pdflatex -> the main PDF document; get page numbers for
structural elements using pdfsync; for each structural element -> patch the
Latex source file with 'shaded' entries & generate the highlighted PDF document
(pdflatex).
8. Generate RDF metadata of structural elements and populate the RDF store.
Original issue reported on code.google.com by nikita.z...@gmail.com
on 18 Jul 2011 at 1:51
GoogleCodeExporter commented
Original comment by nikita.z...@gmail.com
on 26 Jul 2011 at 11:31
- Changed state: Fixed