Updating google drive file contents yields corrupted file due to multipart
Opened this issue · 5 comments
Hello! Big fan of this library. Let's say I have a csv I'd like to update, and the new revision looks like this:
a,b
1,4
2,5
3,6
If I try to update an existing file's contents (such that I can retain revision history, rather than deleting+creating) like this:
req = drive.files.update(fileId=new_file_id, upload_file=data.read(), supportsAllDrives=True, fields=fields)
On Google Drive, I'll get a corrupted multipart file.
--3c76f6a5ff7d445f9320bfd7b5bdfaee
Content-Type: application/json
Content-Length: 4
null
--3c76f6a5ff7d445f9320bfd7b5bdfaee
Content-Type: text/csv
a,b
1,4
2,5
3,6
--3c76f6a5ff7d445f9320bfd7b5bdfaee--
But if after declaring the req
(and before requesting) I disable multipart:
req.media_upload.multipart = False
The file updates fine! Is there a way this could be fixed more automatically in the library?
Also - disabling multipart does nothing to fix the issue for pipe_from uploads. You'll get an identically corrupted file regardless.
Hi, thanks for reporting the issue, and I'm glad you're finding the lib useful!
I'll be happy to accept a PR with a fix, thanks
Since everything is dynamically generated, I don't know how to fix this for one specific method.
Also ideally the fix works for pipe_to
as well, but I'm not quite sure what that looks like.
Basically there's probably a better fix that me trying to patch one specific method the way I'm doing right now in my code
Can you share with me a full for reproduction please?
Also, the expected result so that I can compare it to the corrupted multipart file.
Also I just noticed that you're passing a file object instead of a path as the upload_file argument. I don't think this is correct.
Sure here's a more complete example. I'll let you fill in your own aiogoogle
and parent_id
. I'm using a file object because that works for both creates and updates, but pipe_from only works for creates. The multipart workaround does not work for pipe_from.
from io import BytesIO
import pandas as pd
df = pd.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6]})
data = BytesIO()
df.to_csv(data, index=False)
data.seek(0)
# Create that works (16 byte file)
async with aiogoogle:
drive = await aiogoogle.discover('drive', 'v3')
req = drive.files.create(upload_file=data.read(), supportsAllDrives=True, fields='id', json={ # type: ignore
'name': 'test.csv',
'parents': [parent_id]
})
res = await aiogoogle.as_service_account(req)
new_file_id = res['id']
# Update that doesn't work (229 byte file)
async with aiogoogle:
drive = await aiogoogle.discover('drive', 'v3')
req = drive.files.update(fileId=new_file_id, upload_file=data.read(), supportsAllDrives=True, fields='id')
# req.media_upload.multipart = False
await aiogoogle.as_service_account(req)
# Update that works (16 byte file)
async with aiogoogle:
drive = await aiogoogle.discover('drive', 'v3')
req = drive.files.update(fileId=new_file_id, upload_file=data.read(), supportsAllDrives=True, fields='id')
req.media_upload.multipart = False
await aiogoogle.as_service_account(req)