/drive_backup_scripts

Scripts used for archving/backing-up Katz lab data from files.brandeis.edu

Primary LanguageShell

Requirements:
- Install gsutil : https://cloud.google.com/sdk/docs/install#deb
- Install rclone: https://rclone.org/
- Install par2: sudo apt install par2

1 - Copy folder from katz file-share
2 - Rename any subdirectories/files with spaces and parenthesis to 
    have underscores only
    a - Command : find '.' -iname "* *" -exec rename 's/ /_/g' {} \;
    b - Command : find . -iname "*\(*" -exec rename 's#\(#_#g' {} \;
    c - Command : find . -iname "*\)*" -exec rename 's#\)#_#g' {} \;
    d - Compiled into "cleanup_names.sh", run using bash cleanup_names.sh [directory]
    e - *** Check that everything looks good before proceeding!! ***
3 - Move files within the top-level directory into a folder named "unclaimed"
    a - Compiled into "move_unclaimed_files.sh",
            run : bash move_unclaimed_files.sh [directory] 
4 - Archive all subdirectories (pigz -0)
    a - Compiled into "create_archives.sh",
            run : bash create_archives.sh [directory][email_address] 
                    - Script will send email when complete
            * Archives are zip files to allow searchability
            * Archives split into 100G chunks to help with I/O
5 - Create directoy of contents
    a - As tree (probably for visualization)
    b - As file list (e.g. generated by "find")
    c - Compiled into "gen_file_lists.sh",
            run : bash gen_file_lists.sh [directory] 
6 - Create par files
    - To allow recovery in case of corruption
    - Set to 10% parity
7 - Upload to google filestream, and google cloud storage
            run :
                bash cloud_storage_copy_dir.sh
                google_filestream_copy_dir.sh

================================
- "create_archives_pipeline.sh" file compiles steps 1-5 for convenience
            run : bash create_archives_pipeline.sh [directory][email_address]