dracor-org/dracor-api

Improve performance when reloading a corpus

Closed this issue · 1 comments

cmil commented

Currently, when a corpus is loaded from its Git repository, already existing TEI documents in this corpus are deleted one by one to clean up before loading the files from the repo. For large corpora this can take a long time and also slow down the database. To speed things up the below code should be changed to remove the entire data collection at once.

(: remove TEI documents :)
for $tei in collection($data-collection)/tei:TEI
let $resource := tokenize($tei/base-uri(), '/')[last()]
return (
util:log-system-out("removing " || $resource),
xmldb:remove($data-collection, $resource)
),

cmil commented

The solution proposed for this issue has been implemented in 2e6eb07 but did not achieve the performance gain we hoped for. A new approach will be taken in #241.