- Code derives from dspace-csv-archive https://github.com/lib-uoguelph-ca/dspace-csv-archive
- Uses XSLT to transform input XML documents into required XML documents.
- Brings the files and XML documents into the DSpace Simple Archive Format.
- Uses SaxonC-HE for the XSLT transformations
- Installation Instructions: https://pypi.org/project/saxonche/
- Documentation: https://www.saxonica.com/saxon-c/doc11/html/saxonc.html
.
├── dspace-xslt-archive.py
├── dspacearchive.py
├── item.py
├── jats_to_dc-psycharchives.xslt
├── jats_to_zpid-psycharchives.xslt
├── make_simple_archive_format
│ ├── PDF
│ │ ├── document_1.pdf
│ │ ├── document_2.pdf
│ │ └── document_3.pdf
│ └── XML
│ ├── document_1.xml
│ ├── document_2.xml
│ └── document_3.xml
├── dc_schema.xsl
└── zpid_schema.xsl
- Usage: ./dspace-xslt-archive.py
make_simple_archive_format
- Puts XSLT files (e.g. Dublin Core and ZPID Schema) in the directory, where all the *.py files are (as in the structure above)
- Sets variables
xsl_files
indspace-xslt-archive.py
- NOTE: It is possible to use other Schema in DSpace, but make sure you have defined the new schema in the DSpace Metadata Schema Registry.
jats_to_dc-psycharchives.xslt
--> XSLT for converting JATS XML to PsychArchives Dublin Core Schemajats_to_zpid-psycharchives.xslt
--> XSLT for converting JATS XML to PsychArchives ZPID Schema
- Directory
make_simple_archive_format
contains two subdirectoriesPDF-A
andXML
PDF-A
contains all files (bitstreams) for batch importXML
contains metadata of the files (bitstreams) in XML
- NOTE:
collections
file contains the handle of the owning collection. The owning collection is the FIRST dc.type in the PsychArchives XML- The name of the directory
make_simple_archive_format
does not matter, but the names of its subdirectories must bePDF-A
andXML
. - The PDF files and their corresponding XML documents must contain the same
file_basename
(e.g.32302
in003-003 32302.pdf
and8454-32302.xml
). - Output directory = 'Input_Directory' + 'saf' and subdirectories (item directores) are identical with the
file_basename
make_simple_archive_format_saf
├── document_1
│ ├── collections
│ ├── contents
│ ├── document_1.pdf
│ ├── dublin_core.xml
│ └── metadata_zpid.xml
├── document_2
│ ├── collections
│ ├── contents
│ ├── document_2.pdf
│ ├── dublin_core.xml
│ └── metadata_zpid.xml
└── document_3
├── collections
├── contents
├── document_3.pdf
├── dublin_core.xml
└── metadata_zpid.xml
- DSpace 7 Documentation - Importing and Exporting Items via Simple Archive Format (SAF): https://wiki.lyrasis.org/pages/viewpage.action?pageId=104566653