gdcc/pyDataverse

OAI-PMH integration

Opened this issue · 1 comments

Integrate OAI-PMH endpoint and data conversion.

Requirements

  • Mapping of data from OAI-PMH endpoint (DDI XML and/or DC)
  • Import of data
  • Export of data
  • XML schema
  • validate against schema

ACTIONS

0. Pre-Requisites

1. Research

pyoai

from oaipmh.client import Client
from oaipmh.metadata import MetadataRegistry, oai_dc_reader
url = "https://data.aussda.at/oai"
registry = MetadataRegistry()
registry.registerReader('oai_dc', oai_dc_reader)
client = Client(URL, registry)

for record in client.listRecords(metadataPrefix='oai_dc'):
  print(record)

oai-harvest

oai-harvest --set "all_published" --metadataPrefix "oai_ddi" https://data.aussda.at/oai

sickle

from sickle import Sickle
sickle = Sickle('https://data.aussda.at/oai')
records = sickle.ListRecords(metadataPrefix='oai_ddi')
record = records.next()
record.header
record.header.identifier
record.metadata

2. Plan

  • Define requirements

3. Implement

  • Write tests
  • Write code
  • Write and update Docs
  • Write Docstrings
  • Run pytest
  • Run tox
  • Run pylint
  • Run mypy

4. Follow Ups

  • Review
    • Code
    • Tests
    • Docs

As discussed during the 2024-02-14 meeting of the pyDataverse working group, we are closing old milestones in favor of a new project board at https://github.com/orgs/gdcc/projects/1 and removing issues (like this one) from those old milestones. Please feel free to join the working group! You can find us at https://py.gdcc.io and https://dataverse.zulipchat.com/#narrow/stream/377090-python