/aspace_tools

Python package for interacting with ArchivesSpace

Primary LanguagePythonMIT LicenseMIT

ArchivesSpace Tools

aspace_tools is a Python package for interacting with the ArchivesSpace API and MySQL database.

Requirements

  • Python 3.10+
  • Read/write access to ArchivesSpace API and/or database

Installation

$ cd /path/to/aspace_tools/src
$ pip install .

Tutorial

This package includes functions for creating, reading, updating, and deleting data from ArchivesSpace. It was created for archivists who use the ArchivesSpace API to programmatically modify archival metadata.

aspace_tools can be imported into a Python script or into an interactive Python session. Standalone scripts which can more easily be distributed to end users can be generated using the generate_script.py script.

Getting started: configuration

An empty configuration file entitled as_tools_config.yml is included in the /src directory, so that users may store login credentials and file/directory paths. Entering data into the configuration file is not required. Any data that is missing will be requested by the application when the user attempts to call a function.

ArchivesSpace credentials

Connecting to the ArchivesSpace API requires the ArchivesSpace API URL along with the user's username and password.

Input/output files

Most commonly, the functions in this package will be run against a CSV file that is supplied by the user, which contains the data about each record that is to be modified. The configuration file should include the path to this CSV file, e.g. /Users/username/path/to/file.csv for Mac or C:\Users\username\path\to\file.csv. The required CSV fields for each function are listed in the API documentation for this package. A CSV template can be generated for a given function by running the generate_script.py script.

Backup directories

When records are updated using functions in this package, JSON backups of the data prior to the updates are saved to a backup directory that is defined by the user.

Other configuration settings

There are other configuration settings that can be included in order to take advantage of less-developed parts of this package, including the db_tools.py module which facilitates querying the ArchivesSpace MySQL database. Additional documentation on these modules is forthcoming.

Running aspace_tools within an interactive session

The aspace_tools package can be imported into the user's preferred Python interpreter and run interactively. To import the module, either install it via pip as described above, or navigate to the /src directory, and enter the following:

$ python
Python 3.8.5 (default, Sep  4 2020, 02:22:02)
[Clang 10.0.0 ] :: Anaconda, Inc. on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from aspace_tools import aspace_run, aspace_requests
>>>

Creating an ASpaceConnection object

Authentication is required before sending HTTP requests to the ArchivesSpace API. The aspace_tools package contains a class, ASpaceConnection, which handles authentication and user-defined configuration settings.

To authenticate, enter the following into the interpreter:

>>> aspace_conn = aspace_run.ASpaceConnection()
Login Successful!: https://testarchivesspace.your.domain.edu/api

If credentials are present the in the as_tools_config.yml file, authentication will be attempted automatically. If not, the user will be prompted to enter credentials into the interpreter.

Sending a batch of requests

After the user is authenticated, it is possible to begin sending requests. Requests are sent by calling methods in the aspace_requests.ASpaceRequests class. To access these methods, first instantiate the class with the ArchivesSpace connection object as an argument, and then call one of the available methods to send the request. For example:

>>> client = aspace_requests.ASpaceRequests(aspace_conn)
>>> client.update_date_begin()

If an input CSV path is present in the as_tools_config.yml file, the update will begin immediately. If not, the user will be prompted to enter the path (e.g. /path/to/the/input/file.csv) to the file into the interpreter.

When the method is called, the input CSV file will be loaded, and each row will be passed into the method. If the record already

The method will take the data in the CSV row and form it into a valid JSON record. It will return this JSON record, along with the endpoint where the request will be sent.

Types of requests

aspace_tools supports a variety of CRUD (create, read, update, and delete) requests. There are some variations in how each of these request types is made:

  • Create: These types of requests create new records in ArchivesSpace. The only input for create methods is a CSV file which is used to form new JSON records which are posted to ArchivesSpace
  • Read: These requests retrieve data from ArchivesSpace. The input for read methods is a CSV file containing the URIs of the records to be retrieved.
  • Update: These requests update existing ArchivesSpace data. Input CSV files for update methods must contain the URI of the existing record in addition to the data that is to be updated. The request methods will retrieve the existing record from ArchivesSpace,
  • Delete: These requests delete existing ArchivesSpace data.

Note that if a CSV input path is present in the configuration file and one of these methods is calld, the update that is defined in the method will be applied to all of the records in the input spreadsheet. When th

Consult the API documentation for more information on available methods and their required CSV fields.

When a request is sent

A progress bar will appear in the terminal after the method is called, which will indicate how many records have been processed, the total number of records to be processed, and the overall progress percentage.

If the request is a read, update, or delete request, a JSON backup file will be created for each URI on the input spreadsheet, using the backup directory supplied by the user in the as_tools_config.yml file or, if this value is not present in the configuration file, when prompted to enter the path into the interpreter.

Any errors which are encountered during the process will be printed to the interpreter, as well as to a log file in the /logs directory specified by the user (or, in the absence of this directory, to the user's home directory). After each error, the user will be prompted to retry the update, skip the record, or stop the entire process.

In addition to the error log, two output files will be written to the directory in which the input CSV file is stored. These files will be named [original_filename]_success.csv and [original_filename]_errors.csv. Each row from the original spreadsheet, plus the URI of the created or updated record, will be written to one of these spreadsheets depending on the outcome of the update.

Making multiple requests in an interactive session

To make another request within the same interactive session, using a different set of input data, the user must change the input CSV path in your configuration file. There are two ways to do this:

Method 1: Updating in the Python interpreter

Within the Python interpreter, enter the following to manually update the input CSV file and backup directory:

>>> print(aspace_conn.csvfile, aspace_conn.row_count, aspace_conn.dirpath, aspace_conn.sesh, sep='\n')
/path/to/old/input/file.csv
79
/path/to/old/backup/folder
<requests.sessions.Session object at 0x7f9fa03e5450>
>>> new_file_path = '/path/to/new/input_file.csv'
>>> new_backup_directory = '/path/to/new/directory'
>>> aspace_conn.update_from_input(new_file_path, new_backup_directory)
>>> print(aspace_conn.csvfile, aspace_conn.row_count, aspace_conn.dirpath, aspace_conn.sesh, sep='\n')
/path/to/new/input_file.csv
130
/path/to/new/directory
<requests.sessions.Session object at 0x7f9fa03e5450>
>>> client.cfg = aspace_conn

Method 2: Updating the configuration file

Open the configuration file, usually as_tools_config.yml, and update the required values - often this means updating the input CSV file and the backup directory. Save the file. Then, in your Python interpreter, enter the following:

>>> aspace_conn.update_from_config()
>>> client.cfg = aspace_conn

In both methods, the input CSV file and/or backup directories will be changed, but the ArchivesSpace HTTP session will remain the same. To change the API instance you are working with, you must instantiate a new ASpaceConnection object.

Operating on individual records

This package is designed to take an input CSV file and perform actions on all rows in that CSV file. However, it is possible to run pieces of the code on individual records if desired.

Forming a JSON template

>>> csv_row = {'search_string': 'A string to search'}
>>> endpoint = ASpaceRequests.search_all(csv_row)
>>> print(endpoint)
>>> /search?q=A string to search

Getting a JSON record

>>> csv_row = {'uri': '/repositories/2/archival_objects/294923'}
>>> 

Posting a JSON record

>>> csv_row = {'uri': '/repositories'}
>

Using aspace_tools in your own code

Another way to use this code is to include it within your own Python files. For example, if the aspace_tools package is installed on the end-user's computer, this code could be stored in a .py file which is run from the Terminal or command prompt.

#!/usr/bin/python3

import csv
from aspace_tools import aspace_requests, aspace_run


def prep_data(fp):
	'''Takes some input data and modified it'''
	with open(fp, encoding='utf8') as infile:
		reader = csv.reader(infile)

def write_data(data):
	aspace_conn.update_from_input(new_file_path, new_backup_directory)
	pass

def update_data():
	aspace_conn = ASpaceConnection.from_dict(config_file='/path/to/your/as_tools_config.yml')
	client = ASpaceRequests(aspace_conn)
	client.update_date_begin()


def main():
	fp = "..."
	prep_data(fp)


if __name__ == "__main__":
	main()

Generating scripts via generate_script.py

Not all users will need to install the entire aspace_tools package on their local machines. Sometimes users will want a standalone script for just their particular use case. The generate_script.py module generates these scripts from the aspace_tools package, along with a blank configuration file and a CSV template that can be provided to the end user.

To generate a new script, run the following command:

python -m aspace_tools.generate_script

Follow the on-screen prompts to select an available JSON template, and an output directory for the generated files.