Zenodup

This application was developed to upload the abstracts of the DHd-Conferences to Zenodo. It is integrated in a workflow in order to collect, structure and publish the abstracts and the associated metadata. The use case for the application is on the one hand to create a valid bundle structure and on the other hand to interact with the Zenodo API.

Overview

legacy/: Legacy python scripts which are not integrated in generic workflow
resources/: png resources for README.md
zenodup/: Zenodup application source code
- bundles/: Python package to handle creation of bundle structure
- INPUT/: Default input directory
- OUTPUT/: Default output directory
- support/: Default support directory

Requirements/Installation

Python 3.9 is required to run this application. Download Zenodup repository. If package manager pip is installed, navigate to project folder and run:

pip install -r requirements.txt

Usage

Two different tasks can be executed by running the application's main script zenodup/zenodup.py:

Creating a bundle strucutre for conference's data
Interacting with Zenodo's REST API:
- Upload abstracts
- Publish drafts
- Delete drafts
- Update local metadata files
- Get zenodo metadata of conferences for annual packages

It is also possible to run all actions on the Zenodo Sandbox for testing purposes.

Overview of workflow

Configurations

In zenodup/config.yml the working directories such as desired input or output directory for the conferences' bundle structures are set.

input_base: Input directory for conference in order to create bundle structure. (Default: zenodup/INPUT/)
output_base: Output directory for bundle structure. Bundle structure needs to be in output_base directory for Zenodo API interaction. (Default: zenodup/OUTPUT/)
depositions_dir: Directory to save and load deposition files of conferences (Default: zenodup/support/depositions/)
logging_dir: Directory to save logging files (Default: zenodup/support/logging/)
assignments_dir: Directory for csv files to check final assignments of bundle creation (Default:zenodup/support/assignments/)
packages_dir: Directory for csv files containing Zenodo metadata of all published abstracts (Default:zenodup/support/package/)
update_dir: Directoy for updated metadata files (not part of regular workflow)(Default: zenodup/support/updated_metadata/)

Create Bundles for Upload

Change current working directory to /zenodup/zenodup/. In order to interact with Zenodo's REST API via this application, the conferences have to be restructured in a certain bundle structure. Run script zenodup.py with argument bundle for assigning conference papers to bundles based on metadata file. Put folder with conference files in configured input directory (Default:/support/INPUT/). The conference folder is expected in the following structure:

  CONFERENCE
    ├── metadata.xml                   # Metadata file for conference containing all relevant information of the conference's publications
    ├── xml                            # Folder to xml files of the conference's pubclications
    │   ├── abstract1.xml                    
    │   └── ...                       
    └── pdf                            # Folder to pdf files of the conference's publications (optional)
        ├── abstract1.pdf                   
        └── ...

Remark: You can find an example dataset of a conference under /INPUT/example/.

Workflow to create bundle structure:

To create the bundle structure for a conference, the following arguments need to be taken into account:

name: Name of conference's folder to be restructured
metadata: Name of the conference's metadata file
-sequenced (optional): If parameter is passed, the order of files is assumed to be the same as appearances of metadata tags in metadata file. If not passed, the files will be assigned by name scheme.
-pdf (optional): Name of directory containing conference's pdf files. If neither passed nor name is given the default is 'pdf'.
-xml (optional): Name of directory containing conference's xml files. If not passed, there will be no xml files taken into account for the single abstracts of the conerence. If passed and no name is given the default is 'xml'.

Example usage:

# create bundle structure for conference "example" by name scheme
python zenodup.py bundle example example_metadata.xml -pdf -xml

The bundle structure will be created in the configured output directory (Default:/OUTPUT/[CONFERENCE]).

Logging file with name [CONFERENCE]_bundle.log will be created under configured logging directory (Default: support/logging).

Interact with Zenodo API

Change current working directory to /zenodup/zenodup/. Run script zenodoup.py with argument api to run tasks with Zenodo API. If the bundle structure hasn't been created automatically with this application, this script expects the following bundle structure under configured output_base (Default: "/OUTPUT"):

  CONFERENCE                        # folder containing all bundles to be uploaded as single publications
    ├── abstract1                   # bundle folder for abstract1                
    │   ├── bundle_metadata.json    # abstract's metadata in json format
    │   └── bundle_publications
    │       ├── abstract1.pdf       # abstract's pdf file         
    │       └── abstract1.xml       # abstract's xml file (optional)                   
    └── ...

Remark: This bundle structure is automatically generated by creating the bundle structure with this application.

Workflow to publish abstract's on Zenodo:

For more information about file bundle_metadata.json please see Zenodo REST API Documentation.

To interact with Zenodo API the following arguments need to be taken into account:

action
- upload: Upload of abstracts to Zenodo. In order to upload the abstracts the files must be available in the required bundle structure under the configured output directory.
- publish: Publishes drafts of given conference in Zenodo.
- update: Adds missing notes and references to the drafts' metadata.
  
  CAUTION: This method contains hardcoded elements.
- delete: Deletes drafts of given conference from Zenodo
- get_metadata: Saves the abstracts' metadata for conference's annual package. This method is used to create an csv file containing all final abstracts metadata of conference. In order to add publication category (e.g. poster, panel, ...) to csv file the conferences' files must be available in the required bundle structure under the configured output directory.
  
  CAUTION: This method contains hardcoded elements.
- write_identifiers_for_posters: Writes the abstract's concept doi as related identifier in poster's metadata. This method is used to add the abstract's concept doi to the affiliated poster publication. For this method the posters have to be stored in a subdirectory of the INPUT directory. It is necessary for each conference that the poster directory is called [CONFERENCE NAME]_posters containing the respective metadata file for the posters named [CONFERENCE NAME]_posters.xml (the final path is therefore /INPUT/[CONFERENCE NAME]_posters/[CONFERENCE NAME]_posters.xml).
name: Name of conference's folder with bundle structure
token: Generated access token to use Zenodo API.
-productive (optional): If argument is given, the bundles will be uploaded to productive system. Otherwise they will be uploaded to the zenodo sandbox.

Personal access token needs to be either created for sandbox or productive system:

Register for a Zenodo account if you don’t already have one.
Go to your Applications, to create a new token.
Select the OAuth scopes you need (for the quick start tutorial you need deposit:write and deposit:actions).

Example usage:

# Upload conference's abstracts to sandbox
python zenodup.py api upload [CONFERENCE] [ACCESS_TOKEN]

# Upload conference's abstracts to productive system
python zenodup.py api upload [CONFERENCE] [ACCESS_TOKEN] -productive

Logging file with name [CONFERENCE]_upload.log will be created under support/logging/.

Project related links

Published abstracts can be found under DHd Community on Zenodo
Data for the conferences of 2014-2020 can be accessed here: https://github.com/DHd-Verband

Authors and Acknowledgment

License

MIT

Project status

This application is designed for specific use cases. It should be adjusted in case for generic usage:

integrate tests
if xml files of abstracts are available assign abstracts by title from xml files and not by name scheme
in case of versioning of publications a new workflow to maintain the abstracts deposition ids has to be created
functions update and get_metadata in script zenodup/api.py contain hardcoded elements

cceh/zenodup