ArchiveTeam/terroroftinytown

Discussion: Exporting and Uploading

Closed this issue · 1 comments

@whs, I'm confused about how the iaexporter.py works. Is it dumping data directly from the database onto Internet Archive?

What I realize is that it needs to be multiple stages so it is fail-safe.

  1. Export rows from database before DATE
  2. Export the project settings into JSON files
  3. Zip up the exported files per project
  4. Delete the rows before DATE so the database doesn't get full
  5. Upload the files

I was thinking that export.py can handle the zipping and deleting. iaexporter.py would be rewritten as a script that uploads the zip files.

Another supervisor script would be written that runs from cron and the script runs both the export and upload scripts. The supervisor script would write a sentinel file to stop any more runs. The script log all standard output and errors during the processes. If the supervisor does not get an error after finishing, it deletes the sentinel file.

Does this sound good? Is it ok for me to rewrite iaexporter.py?

whs commented

IAExporter extends from export. What it does is very similar to normal
exporting but:

  1. When started all configuration is read from config file, not command line
  2. It then create an item in archive.org
  3. Normal export logic is run but all data is written to StringIO instead
    of a file
  4. After the exporting for that file is done, the file will be uploaded to
    archive.org
    (it is done per-file)
  5. After all exporting is done the last datetime of entries used will be
    written to a file to be read in subsequence runs

so you get the exact same structure as normal exporting.

I think to achieve what you're thinking the file should be rewritten from
the ground up as it shouldn't be possible from the current one, so feel
free to go ahead.

On Sat, Sep 6, 2014 at 11:07 PM, Christopher Foo notifications@github.com
wrote:

@whs https://github.com/whs, I'm confused about how the iaexporter.py
works. Is it dumping data directly from the database onto Internet Archive?

What I realize is that it needs to be multiple stages so it is fail-safe.

  1. Export rows from database before DATE
  2. Export the project settings into JSON files
  3. Zip up the exported files per project
  4. Delete the rows before DATE so the database doesn't get full
  5. Upload the files

I was thinking that export.py can handle the zipping and deleting.
iaexporter.py would be rewritten as a script that uploads the zip files.

Another supervisor script would be written that runs from cron and the
script runs both the export and upload scripts. The supervisor script would
write a sentinel file to stop any more runs. The script log all standard
output and errors during the processes. If the supervisor does not get an
error after finishing, it deletes the sentinel file.

Does this sound good? Is it ok for me to rewrite iaexporter.py?


Reply to this email directly or view it on GitHub
#3.