/SwiftBulkUploader

Script that reads through an entire directory, records paths of all files in a mysql database and uploads all the files. The advantage of recording the paths of all the files is it allows the uploading process to be cancelled and continued. This was necessary for directories with millions of files.

Primary LanguagePython

Swift Bulk Upload

These scripts assist in uploading an entire directory onto swift. They were intended for directories containing millions of files up to several terabytes large.

Requirements

  • Python 2.7+
  • python-mysqldb
  • Python Swiftclient
  • MySQL database
  • The following environment variables
  • OS_AUTH_URL
  • OS_USERNAME
  • OS_TENANT_NAME
  • MYSQL_HOST
  • MYSQL_USER
  • MYSQL_PASSWD
  • MYSQL_DB

Usage

  1. Index target directory with prepareupload.py
$ python prepareupload.py PathTodirectory MysqlTableName

This creates a table MysqlTableName and populates it with paths to all files in PathToDirectory. It outputs the following log files:

  • MysqlTableName.prepare.error.log # Will log any file path that failed when written to the database.
  • MysqlTableName.prepare.out # A real time log file as file paths are being parsed.

While the above command is running, in a new tab run the following command to watch the progress of the parsing:

$ tail -f MysqlTableName.prepare.out
  1. Upload files as stored in step 1.
$ python bulkupload.py containername MysqlTableName 3

This creates 3 processes that reads from MysqlTableName and uploads files into the container containername. If the upload process is stopped, it can be re-run and continue uploading without reuploading already uploaded files. Increase 3 to an appropriate number that your CPU can handle for faster speeds.

This script outputs the following files:

  • MysqlTableName.upload.out # Real time progress of upload
  • MysqlTableName.error.log # Logs failed uploads
  • MysqlTableName.report.log # Created when upload is complete with summary of results.

To check the progress of the upload, run the following command:

$ tail -f MysqlTableName.upload.out