Split source XML files to reduce XML-to-SPARQL ETL memory consumption

Question

Opened this issue 5 years ago · 2 comments

Split each source file into a folder of individual record files, using streaming file splitter.
Refactor XML-to-SPARQL pipeline to individually load record files from these folders, and pass to the RDF conversion XSLT.
Pass the record's type to the conversion XSLT as a parameter (replacing the file type recognition code in the XSLT)
Replace the stylesheet which marks some Piction images as preferred with equivalent SPARQL update query.

Answer 1 · 2019-07-09T06:19:46.000Z

Replace the Piction stylesheet with a SPARQL query first, since that part doesn't depend on the other (XML splitting) changes.

Answer 2 · 2019-07-09T06:20:05.000Z

Waiting for go-ahead