DSpace CSV Archive

Takes a simple CSV spreadsheet, and a bunch of files and magically turns them into the DSpace Simple Archive format. Supports unicode characters in metadata. The tool will automatically strip unicode characters out of filenames.

Requirements

Requires Python version 3.8 or greater

Some simple rules for the CSV spreadsheet

The first row should be your header, which defines the values you're going to provide.
Only one column is mandatory: 'files'. Files can be organized in any way you want, just provide the proper path relative to the CSV file's location.
Add one column for each metadata element (eg: dc.title)
The order of the columns does not matter.
Only dublin core metadata elements are supported (for now).
Use the fully qualified dublin core name for each element (eg dc.contributor.author).
Languages can be specified by leaving a space after the element name and then listing the language.
Separate multiple values for an element by double-pipes (||).
If your metadata value has a comma in it, put some quotes around it. Eg: "Roses are red, violets are blue".

Example CSV structure

files	dc.title en	dc.contributor.author en	dc.subject	dc.type
something1.pdf\|\|something_else1.pdf	title 1	author 1	subject 1	Report
directory/something2.pdf	"title 2, with comma"	author 2a\|\|author 2b	subject 2	Article

Usage

./dspace-csv-archive /path/to/input/file.csv

python3 ./dspace-csv-archive /path/to/input/file.csv

If successful, the script will place the processed files into a directory called output in whatever directory you were in when you ran your command.

Note: The tool will overwrite any exisitng content in the output directory when it is run. If you want to save the results, copy them somewhere safe before you run the tool a second time.

Importing into DSpace

If it is not already, the directory should be placed in a location that the dspace user can access it and write to the directory. I recommend putting the directory into /home/dspace/imported-data/ and leaving it there so the mapfile can be easily found if it is needed later, e.g. to remove or modify imported data. One way to do this is:

sudo cp -r [directory-name] /home/dspace/imported-data/
sudo chown -R dspace:dspace /home/dspace/imported-data/[directory-name]

Now we are ready to use the import command that comes with DSpace. Be sure to run this command as the dspace user. Something like:

[dspace]/bin/dspace import --add --eperson=[importer's email address] --collection=[collection handle] --source=[directory-name] --mapfile=[directory-name]/mapfile

Before running the import, you can validate your import by running the same command above along with the validate argument. This will test the import without actually importing anything and report any issues:

[dspace]/bin/dspace import --add --valideate --eperson=[importer's email address] --collection=[collection handle] --source=[directory-name] --mapfile=[directory-name]/mapfile

Running the import command without the validate argument will add the items in the directory to the specified collection, and document the operations that were completed in the mapfile. If the import didn't work as you planned, you can use the mapfile to reverse the operations.