Split source XML files to reduce XML-to-SPARQL ETL memory consumption
Opened this issue · 2 comments
Conal-Tuohy commented
- Split each source file into a folder of individual record files, using streaming file splitter.
- Refactor XML-to-SPARQL pipeline to individually load record files from these folders, and pass to the RDF conversion XSLT.
- Pass the record's type to the conversion XSLT as a parameter (replacing the file type recognition code in the XSLT)
- Replace the stylesheet which marks some Piction images as preferred with equivalent SPARQL update query.
Conal-Tuohy commented
Replace the Piction stylesheet with a SPARQL query first, since that part doesn't depend on the other (XML splitting) changes.
Conal-Tuohy commented
Waiting for go-ahead