gdcc/pyDataverse

problem with replace_datafile

Closed this issue · 7 comments

Bug report

1. Describe your environment

2. Actual behaviour:
I use pydataverse and it’s very nice but i have a problem when i use api.replace_datafile. I describe the problem :
In all case (upload or replace) i have this code :
df = DataFile()
df.set(
{"pid": doi_ds,"filename": chemin_fichier,"categories":["Data"],"forceReplace":"true"})

I upload a file with this line : resultat_upload = api.upload_datafile(doi_ds, df.filename, df.json())
It’s ok and Dataverse detect the type of file (for example MS Word for a docx file)

I would to replace the file with this code : resultat_remplace = api.replace_datafile(identifier=pid_fichier, filename=chemin_fichier,
json_str=df.json(), is_filepid=True)
I have an error message : b'{"status":"ERROR","message":"The original file (MS Word) and replacement file (Plain Text) are different file types."}'

It’s the same thing with other extension file (for example .xyz) when Dataverse is able to fix a type of file (for view tools i suppose).
api.replace_datafile don’t fix the content-type and Dataverse make the job but for replacing there is a bug. I understand in the code of dataverse that i can force but "forceReplace" :"true" has no effect. With native API in curl there is no problem.
Can you help me.
Thanks for advance.
I can give some detail if necessary or make test.

3. Expected behaviour:
just replace the file.

4. Steps to reproduce

i think, the code is clear for reproducing.

5. Possible solution
have a syntax for forcing or ask dataverse developper.

6. Check your bug report

  • Check if your language is written in a positive way. : my english is bad. Sorry ;))

@albenard, thanks for submitting, and please excuse my delayed response.

I was able to replicate the error on demo.dataverse.org and received the same error message. I will investigate a solution and open a pull request upon finding a resolution.

@albenard, I found and fixed the issue in pull request #173. Could you please try testing the PR branch using your case?

I tested it on demo.dataverse.org without any errors. Here's the code I used. Feel free to let me know if you have any questions.

import json
from pyDataverse.api import NativeApi

# Create an instance of the NativeApi class
api = NativeApi(
    "https://demo.dataverse.org",
    "<API_TOKEN>",
)

request_body = {
    "description": "My description.",
    "categories": ["Data"],
    "forceReplace": False,
}

response = api.replace_datafile(
    identifier=<FILE_ID>,
    filename="test_doc.docx",
    json_str=json.dumps(request_body),
    is_filepid=False,
)

print(response.json())

Hello, i test the new version (thank you) but it is not ok for all extensions. The test is ok with a docx file extension but i said in the post : 'It’s the same thing with other extension file (for example .xyz)'. When i test to replace a xyz extension file the problem is present.
You can test with a file named test.xyz with this content :*
A B C
3.4292 -4.31647 -1.66819
3.4292 -4.31647 -1.65319

You poublish this file and after you replace A B C with X Y Z and save the file and test replace :
Error with : The original file (Co-Ordinate Animation) and replacement file (Plain Text) are different file types.
Thanks for advance.

@albenard, thank you for helping me test the code! Unfortunately, the mimetypes library did not work as expected. However, I have found a solution that should work with your test.xyz and a docx file. I have created a PR #174, which proposes a migration to httpx since requests cannot handle the request.

I have also included your example as a test case to make sure everything is working fine. Could you please test it locally to verify if it works?

I test this morning in my case and i confirm it's ok for replacing .docx or .xyz files in a dataset. Thank you.
I am not comfortable with GitHub, and to test, I cloned the project, used branch 174, and copied certain files (.py …) into a copy of my code/environment. I need to distribute this code to users and cannot ask them to perform this operation; I assume there will be a commit for your work. At what point will a simple ‘import pyDataverse’ be sufficient to benefit from your corrections?

Thank you, @albenard, for testing! I am glad to hear that everything is working as expected.

At what point will a simple ‘import pyDataverse’ be sufficient to benefit from your corrections?

We will be discussing the Pull Request during next week's pyDataverse Working Group meeting and review the changes. Once the changes are approved and merged, a new version will be released that can be easily installed via pip install pydataverse.

In the meantime, you can make use of the patched version by installing the package directly from GitHub by running the following command:

pip install git+https://github.com/gdcc/pyDataverse.git@requests-via-httpx

This will install the package and make it available for use as usual via import pyDataverse.