dbpedia/databus

Automating Backups

Opened this issue · 2 comments

  1. backup - reply concept covering all different types of data (includes metadata, collections, other and also MOSS metadata posts), then
  2. implement the backup and replay procedure on a test instance,
  3. execute the backup on the public stuff

Super simple approach (can end up with corrupted data files):

  • just copy the volume -> zip
  • and the docker-compose.yml
  • and .env

More complicated (safe in terms of corrupted data):

  • gstore ->
    • safe copying of all the git repos for backup dump
    • then implement reinsertion of all the triples from all latest versions of files into virtuoso
  • databus ->
    • users -> database dump -> copy sql.lite file
    • collections -> are in gstore -> same logic as for gstore
    • some other data -> private and public keys of the databus copy
kurzum commented

Super simple approach (can end up with corrupted data files):
Stop the docker
just rsync the volume -> zip
and the docker-compose.yml
and .env
Moss data
Check integrity?
Check JSON syntax
Implement sth. lightweight for now, then design and implement a proper strategy.
More complicated (safe in terms of corrupted data):
gstore ->
(within gstore as API call) - safe copying of all the git repos for backup dump
then implement reinsertion of all the triples from all latest versions of files into virtuoso
databus ->
users -> database dump -> copy sql.lite file
collections -> are in gstore -> same logic as for gstore
some other data -> private and public keys of the databus copy
Intersection with MOSS -> user curated additional metadata

Rolling Backup:
Folder 1: Daily Backup, Rolling Window Size 7,
Folder 2: Monday Revision, Window Size 4,
Folder 3: 1st of Month, Window Size 12
Folder 4: 1st of Year, Window Size X:
7 + 4 + 12 + X = 23 + X Backups