EasyDataverse is a Python libary used to interface Dataverse installations and generate Python code compatible to a metadatablock configuration given at a Dataverse installation. In addtion, EasyDataverse allows you to export and import datasets to and from various data formats.
- Code generation from Dataverse TSV metadata configurations.
- Export and import of datasets to various formats (JSON, YAML, XML and HDF5).
- Source code publication from a local or GitHub repository towards a Dataverse installation.
- Fetch datasets from any Dataverse installation into an object oriented structure ready to be integrated.
Get started with EasyDataverse by running the following command
# Using PyPI
python -m pip install easyDataverse
Or build by source
git clone https://github.com/gdcc/easyDataverse.git
cd easyDataverse
python setup.py install
EasyDataverse allows you to generate code based on given metadata configuration TSV files that are typically found in any Dataverse installation. In order to do so, you can use the dedicated command line interface:
➜ ~ dataverse generate --path ./blocks --out ./my_api --name pyMyAPI
For this, you need to specify the following:
--path
- Directory where the TSV files are located.--out
- Where the generated code will be written to.--name
- Name of the resulting API.
Libaries generated by EasyDataverse are an object-oriented implementation of the metadata configuration files. The resulting classes contain all necessary information to facilitate upload and download while providing a simple interface. This was done to reduce the steep learning curve necessary to write the Dataverse JSON file needed by the native Dataverse REST-API. In the following an example workflow will be demonstrated:
Metadata configurations or in this case metadatablocks
are found in the same titled module. These blocks can be imported directly from the API and used in an object-oriented manner. This is demonstrated in the following on the example of PyDaRUS which is an API generated for the Dataverse DaRUS of the University of Stuttgart.
from pyDaRUS import Citation
# Initialize the metadatablock
citation = Citation()
Now the citation
metadata configuration can be filled with information by using attribute assigment. Furthermore, objects in the second hierarchy (aka compounds) can be set using dedicated add_xyz
methods. This way it is not necessary to import the sub-classes.
citation.title = "My Title"
citation.add_author(name="Jan Range", affiliation="SimTech")
When all metadata has been assigned, the Dataset
object is set up. This container-like structure provides all necessary functionalities to upload and update datasets to a Dataverse installation. In order to add metadatablocks
you need to add these via the add_metadatablock
instance method.
from pyDaRUS import Dataset
dataset = Dataset()
dataset.add_metadatablock(citation)
Optionally you can add files and directories to the Dataset
instance, which will be uploaded too later on. Adding directories allows you also to keep the local structure of your dataset.
dataset.add_file(dv_path=".", local_path="my.file")
dataset.add_directory(dirpath="./my/dir")
Finally, you can upload metadata and files using the upload
-method of your dataset
instance. Here you specify the target Dataverse Collection to which the dataset will be added.
dataset.upload(dataverse="myCollection")
🚨 Important note
EasyDataverse inferes the DATAVERSE_URL
and DATAVERSE_API_TOKEN
from your environment variables to prevent accidental credential uploads. You can set these up using the following:
export DATAVERSE_URL="https://my.dataverse.installation"
export DATAVERSE_API_TOKEN="your-token-to-access"
In order to download datasets programmatically from a Dataverse installation, EasyDataverse offers two options. You can use the Dataset
methods from_dataverse_doi
and from_url
to fetch metadata as well as files from any installation.
from easyDataverse import Dataset
dataset = Dataset.from_url("https://my.dataverse.installation/link/to/dataset")
# or
dataset = Dataset.from_dataverse_doi(
doi="doi:my_persistent_id",
dataverse_url="https://my.dataverse.installation"
)
If you'd like to fetch the metadata of a Dataset
without downloading the files, add the download_files = False
parameter to the functions.
EasyDataverse will infer schemes from the installations REST-API and accordingly generates classes in memory. Thus, you can handle datasets in the same way as with any generated API. For instance, you can edit a fetched dataset and upload it to any other installation.
🚨 Important note
Please note, due to limitations from Dataverse, the metadatablocks will only contain fields that were used in the dataset. For complete blocks, consider using the installations generated API.
You can also use the command line interface to fetch data from a Dataverse installation. For this, you only need to provide the URL to the dataset and the following commands.
➜ ~ dataverse fetch https://my.dataverse.installation/link/to/dataset
EasyDataverse allows you to seamlessly push code from your local or remote repository to a Dataverse installation. This can be used in workflows that are triggered by events such as a release to automatically publish your code. In order to do so, you can use the dedicated command line interface:
➜ ~ dataverse push --lang Python --dataverse MyDataverse --lib-name pyDaRUS
For this, you need to specify the following:
--lang
- Programming language used, which will help the parser to infer dependencies.--dataverse
- Target Dataverse Collection to which the code will be pushed.--lib-name
- API used to access the Dataverse. This is necessary to match the metadata config.--token
- API-Token used for authorization. Can also be inferred from env vars.--url
- URL to the Dataverse installation. Can also be inferred from env vars.
🚧 Under construction 🚧
- Jan Range (EXC2075 SimTech, University of Stuttgart)
EasyDataverse
is free and open-source software licensed under the MIT License.