omarryhan/aiogoogle

Updating google drive file contents yields corrupted file due to multipart

Opened this issue · 5 comments

Hello! Big fan of this library. Let's say I have a csv I'd like to update, and the new revision looks like this:

a,b
1,4
2,5
3,6

If I try to update an existing file's contents (such that I can retain revision history, rather than deleting+creating) like this:

req = drive.files.update(fileId=new_file_id, upload_file=data.read(), supportsAllDrives=True, fields=fields)

On Google Drive, I'll get a corrupted multipart file.

--3c76f6a5ff7d445f9320bfd7b5bdfaee
Content-Type: application/json
Content-Length: 4

null
--3c76f6a5ff7d445f9320bfd7b5bdfaee
Content-Type: text/csv

a,b
1,4
2,5
3,6

--3c76f6a5ff7d445f9320bfd7b5bdfaee--

But if after declaring the req (and before requesting) I disable multipart:

req.media_upload.multipart = False

The file updates fine! Is there a way this could be fixed more automatically in the library?

Also - disabling multipart does nothing to fix the issue for pipe_from uploads. You'll get an identically corrupted file regardless.

Hi, thanks for reporting the issue, and I'm glad you're finding the lib useful!

I'll be happy to accept a PR with a fix, thanks

Since everything is dynamically generated, I don't know how to fix this for one specific method.

Also ideally the fix works for pipe_to as well, but I'm not quite sure what that looks like.

Basically there's probably a better fix that me trying to patch one specific method the way I'm doing right now in my code

Can you share with me a full for reproduction please?

Also, the expected result so that I can compare it to the corrupted multipart file.

Also I just noticed that you're passing a file object instead of a path as the upload_file argument. I don't think this is correct.

Sure here's a more complete example. I'll let you fill in your own aiogoogle and parent_id. I'm using a file object because that works for both creates and updates, but pipe_from only works for creates. The multipart workaround does not work for pipe_from.

from io import BytesIO
import pandas as pd

df = pd.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6]})

data = BytesIO()
df.to_csv(data, index=False)
data.seek(0)

# Create that works (16 byte file)
async with aiogoogle:
    drive = await aiogoogle.discover('drive', 'v3')
    req = drive.files.create(upload_file=data.read(), supportsAllDrives=True, fields='id', json={ # type: ignore
        'name': 'test.csv',
        'parents': [parent_id]
    })
    res = await aiogoogle.as_service_account(req)

new_file_id = res['id']

# Update that doesn't work (229 byte file)
async with aiogoogle:
    drive = await aiogoogle.discover('drive', 'v3')
    req = drive.files.update(fileId=new_file_id, upload_file=data.read(), supportsAllDrives=True, fields='id')
    # req.media_upload.multipart = False
    await aiogoogle.as_service_account(req)

# Update that works (16 byte file)
async with aiogoogle:
    drive = await aiogoogle.discover('drive', 'v3')
    req = drive.files.update(fileId=new_file_id, upload_file=data.read(), supportsAllDrives=True, fields='id')
    req.media_upload.multipart = False
    await aiogoogle.as_service_account(req)