/DIVA-Archive-DropFolder

A Python script for submitting projects to DIVA Archive watch folders

Primary LanguagePythonMIT LicenseMIT

DIVA DMF DropFolder

A Python script for automating the creation of .csv. These files are used to trigger DIVA Archive DMF service for archiving media objects (folder sets) to LTO tape.

Description

The script is used in conjunction with the DIVAArchive LTO library software.

The DIVA DMF service is configured to monitor a drop folder location for .csv files. The .csv files are text files that act as the trigger for DIVA to begin archiving, and they contain all the information related to the object that needs to be archived.

This script can be run with a cron job or with Windows Task Scheduler so that .csv files are generated automatically within a few minutes after a folder in placed in the drop folder.

The DIVA portion of this workflow is not detailed or covered here.

The script follows a series of steps:

  1. Check the queue of folder sets in the archive location.
    If the folder count is above the set threshold (default = 10),
    pause the script for 5min and then check again. Continue this loop
    until the archive queue is below the allowed count.

2. Create a list of new folder sets in the drop folder location(s).
If the length of the list is not zero, begin iterating over the list of set list.

3. For each folder in the set list, check to the size to determine if the directory is still growing.
If the folder size is still growing after 90secs, move on the next directory on the list.
If it is not growing, move on to the next step.

4. Walk the entire directory structure of each folder set, and
check each sub-directory and file name for illegal characters.
Replace or remove any illegal characters that are found.

5. Generate the .csv trigger file for each folder in the list that has passed the preceeding steps.

6. Move the .csv and its corresponding folder set into the archive location for DIVA to begin its archivng process.

.csv example:

	#
	# Object configuration
	#

	priority=50
	objectName=81187_SeriesName_SMLS_GRFX
	categoryName=AXF

	<comments>
	81187_SeriesName_SMLS_GRFX
	</comments>

	#sourceDestinationDIVA_Source_Dest=[source-dest name defined in the DIVA Config Utility]
	#sourceDestinationDIVAPath=\\UNC path to\DIVA\DropFolder\Location\

	<fileList>
	81187_SeriesName_SMLS_GRFX/*
	</fileList>

priority = default priority for the archive job (0 - 100), defaults to 50.
obejctName = the name of the folder set.
categoryName = the name of the DIVA tape Category, defined in the DIVA Config Utility
comments = any comments relevant to the set, script defaults to the folder name.
sourceDestinationDIVA_Source_Dest = the source-dest name defined in the DIVA Config Utility, not used in the .csv
sourceDestinationDIVAPath = the UNC path defined in the DIVA Config Utility, not used in the .csv
fileList = the files in the folder set that will be archived, uses an asterisk to include all file and folder paths in the entire directory.

NOTE: DIVA_Source_Dest and DIVAPath are not used in the .csv because both of these values are constant and already defined in the Source-Destinations settings of the DIVA Config Utility.

Prerequisites

Files Included

  • main.py
  • config.py
  • dropfolder_check_csv.py
  • logging.yaml
  • check_dir_size.py
  • archive_queue.py

Getting Started

  • Install prerequisities
  • Create a config.yaml document with the format:          

            paths:          
                  script_root:         
                  mac_root_path:         
                  win_root_path:         
                  drop_folder::         
                  csv_dropfolder:         
                  archiving:         
                  error:         
                  duplicates:         
                  requires_zip:         
                  win_archive::         
            DIVA_Source_Dest:          
            DIVA_Obj_Category:                      
            urls:          
                  core_data_api:         
                  core_manager_api:         
            creds:          
                  name:         
                  password:         

  • In the terminal cd to the source directory for the script and enter the command:
              python main.py