Some tools and a process that helped us import DSpace institutional repository data into Symplectic Elements, and merge most duplicate titles from Research Master, an academic output tracking application.
There is going to be plenty to customise in these files, but hopefully it can save some pain.
- Our implementation
- Assumptions
- Requirements
- The process
- Understand load file specifications and requirements
- Finish your mappings
- Create your load manifest
- Create and populate an initial metadata holding table
- Manipulate some data in the interim metadata table
- Create and populate IR persons table
- Prepare matching names for links table
- Create and populate IR links table
- Import and clean RM metadata
- Identify possible duplicate titles in metadata files - TODO
- Process sorted metadata duplicates - TODO
- Cleanup based on discard flag in duplicates file - TODO
- combine 2 metadata files - TODO
- bring in RM persons CSV file for rejigging columns, then export - TODO
- export IR links file - TODO
- Appendage - notes, alternate paths
The Github repository master is:
http://github.com/LincolnUniLTL/symploading/issues
Because this documents a one-off process, we are unlikely to address issues. Please fork this if you want to adapt it. Feed your changes back via a pull request or whatever, if you think your changes are widely applicable to environments, not just yours.
The project's home is at http://github.com/LincolnUniLTL/symploading and some links in this README are relative to that.