gdcc/pyDataverse

pyDataverse changes the json of a dataset on import too much.

Opened this issue · 0 comments

Thank you for your contribution!

It's great, that you want contribute to pyDataverse.

First, start by reading the Bug reports, enhancement requests and other issues section.

Before we can start

Before moving on, please check some things first:

  • Your issue may already be reported! Please search on the issue tracker before creating one.
  • Use our issue templates for bug reports and feature requests, if that's what you need.
  • Are you running the expected version of pyDataverse? (check via pip freeze).
  • Is this something you can debug and fix? Send a pull request! Bug fixes and documentation fixes are welcome. For more information, see the Contributor Guide.
  • We as maintainers foster an open and welcoming environment. Be respectfull, supportive and nice to each other! :)

Issue

[Explain the reason for your issue]

Dear pyDataverser developers,
I am running pyDataverse to upload dataset to our Dataverse installation.
Our dataset contains custom metadata ("freeKeywordValue")

As recommended in the tutorial, I create the dataset with "ds.from_json(data_json)" (the example is at the end of this message).
However, the upload fails and when I check with "ds.get()", I see that the custom metadata fields fields are not there anymore.
As well the server rejects the input, because pyDataverse removes the entry "metadataLanguage": "en" which is required by Dataverse as stated in the dataverse tutorial https://guides.dataverse.org/en/latest/api/native-api.html#id43.

If I use a simple CURL, my json is accepted perfectly and the dataset is inserted as expected.

The original json:

{
"metadataLanguage": "en",
  "datasetVersion": {
    "metadataBlocks": {
      "citation": {
        "fields": [
          {
            "value": "Youth in Austria 2005",
            "typeClass": "primitive",
            "multiple": false,
            "typeName": "title"
          },
          {
            "value": [
              {
                "authorName": {
                  "value": "LastAuthor1, FirstAuthor1",
                  "typeClass": "primitive",
                  "multiple": false,
                  "typeName": "authorName"
                },
                "authorAffiliation": {
                  "value": "AuthorAffiliation1",
                  "typeClass": "primitive",
                  "multiple": false,
                  "typeName": "authorAffiliation"
                }
              }
            ],
            "typeClass": "compound",
            "multiple": true,
            "typeName": "author"
          },
          {
            "value": [
              {
                "datasetContactEmail": {
                  "typeClass": "primitive",
                  "multiple": false,
                  "typeName": "datasetContactEmail",
                  "value": "ContactEmail1@mailinator.com"
                },
                "datasetContactName": {
                  "typeClass": "primitive",
                  "multiple": false,
                  "typeName": "datasetContactName",
                  "value": "LastContact1, FirstContact1"
                }
              }
            ],
            "typeClass": "compound",
            "multiple": true,
            "typeName": "datasetContact"
          },
          {
            "value": [
              {
                "dsDescriptionValue": {
                  "value": "DescriptionText",
                  "multiple": false,
                  "typeClass": "primitive",
                  "typeName": "dsDescriptionValue"
                }
              }
            ],
            "typeClass": "compound",
            "multiple": true,
            "typeName": "dsDescription"
          },
             {
            "value": [
              {
                "freeKeywordValue": {
                  "value": "MyKeyword1",
                  "multiple": false,
                  "typeClass": "primitive",
                  "typeName": "freeKeywordValue"
                }

              },
              {
                "freeKeywordValue": {
                  "value": "MyKeyword2",
                  "multiple": false,
                  "typeClass": "primitive",
                  "typeName": "freeKeywordValue"
                }

              }

            ],
            "typeClass": "compound",
            "multiple": true,
            "typeName": "freeKeyword"
          },
          {
            "value": [
              "Medicine, Health and Life Sciences"
            ],
            "typeClass": "controlledVocabulary",
            "multiple": true,
            "typeName": "subject"
          }
        ],
        "displayName": "Citation Metadata"
      }
    }
  }
}

The output of ds.get:

{'citation_displayName': 'Citation Metadata', 'title': 'Youth in Austria 2005', 'author': [{'authorName': 'LastAuthor1, FirstAuthor1', 'authorAffiliation': 'AuthorAffiliation1'}], 'datasetContact': [{'datasetContactEmail': 'ContactEmail1@mailinator.com', 'datasetContactName': 'LastContact1, FirstContact1'}], 'dsDescription': [{'dsDescriptionValue': 'DescriptionText'}], 'subject': ['Medicine, Health and Life Sciences']}

The incoming json in the server log:

{
  "datasetVersion": {
    "metadataBlocks": {
      "citation": {
        "fields": [
          {
            "typeName": "subject",
            "multiple": true,
            "typeClass": "controlledVocabulary",
            "value": [
              "Medicine, Health and Life Sciences"
            ]
          },
          {
            "typeName": "title",
            "multiple": false,
            "typeClass": "primitive",
            "value": "Youth in Austria 2005"
          },
          {
            "typeName": "author",
            "multiple": true,
            "typeClass": "compound",
            "value": [
              {
                "authorName": {
                  "typeName": "authorName",
                  "typeClass": "primitive",
                  "multiple": false,
                  "value": "LastAuthor1, FirstAuthor1"
                },
                "authorAffiliation": {
                  "typeName": "authorAffiliation",
                  "typeClass": "primitive",
                  "multiple": false,
                  "value": "AuthorAffiliation1"
                }
              }
            ]
          },
          {
            "typeName": "datasetContact",
            "multiple": true,
            "typeClass": "compound",
            "value": [
              {
                "datasetContactEmail": {
                  "typeName": "datasetContactEmail",
                  "typeClass": "primitive",
                  "multiple": false,
                  "value": "ContactEmail1@mailinator.com"
                },
                "datasetContactName": {
                  "typeName": "datasetContactName",
                  "typeClass": "primitive",
                  "multiple": false,
                  "value": "LastContact1, FirstContact1"
                }
              }
            ]
          },
          {
            "typeName": "dsDescription",
            "multiple": true,
            "typeClass": "compound",
            "value": [
              {
                "dsDescriptionValue": {
                  "typeName": "dsDescriptionValue",
                  "typeClass": "primitive",
                  "multiple": false,
                  "value": "DescriptionText"
                }
              }
            ]
          }
        ],
        "displayName": "Citation Metadata"
      }
    }
  }
}