Unable to donwload/export .docx files

Question

Unable to donwload/export .docx files

Vioshim opened this issue 3 years ago · 5 comments

Description

Hello good morning, as stated by the title, the library seems to fail when attempting to export files to .docx while using drive v3.

This seems to occur due to aiohttp session's content type.

Example

import asyncio
import json

from aiogoogle import Aiogoogle
from aiogoogle.auth.creds import ServiceAccountCreds

service_account_key = json.load(open("client_service.json"))

creds = ServiceAccountCreds(
    scopes=[
        "https://www.googleapis.com/auth/documents",
        "https://www.googleapis.com/auth/drive",
    ],
    **service_account_key,
)

DOCX_FORMAT = "application/vnd.openxmlformats-officedocument.wordprocessingml.document"
GOOGLE_FORMAT = "application/vnd.google-apps.document"

URL = "1N2ZEZd1PEKusdIg9aAYw0_ODmoxHoe5GugkTGZWP21Y"


async def test():
    async with Aiogoogle(service_account_creds=creds) as aiogoogle:
        storage = await aiogoogle.discover("drive", "v3")
        res = await aiogoogle.as_service_account(storage.files.get(fileId=URL))
        if res["mimeType"] == DOCX_FORMAT:
            query = storage.files.get(fileId=URL, alt="media")
        elif res["mimeType"] == GOOGLE_FORMAT:
            query = storage.files.export(fileId=URL, mimeType=DOCX_FORMAT, alt="media")
        res = await aiogoogle.as_service_account(query)
        return res


asyncio.run(test())

Expected behaviour

Obtaining the URL's data as .docx (be it by converting or downloading)

Actual result

Traceback (most recent call last):
  File "C:\Users\Vioshim\V-Bot3\sheltered-crag-49796\.venv\lib\site-packages\aiogoogle\sessions\aiohttp_session.py", line 69, in resolve_response
    json = await response.json()
  File "C:\Users\Vioshim\V-Bot3\sheltered-crag-49796\.venv\lib\site-packages\aiohttp\client_reqrep.py", line 1103, in json
    raise ContentTypeError(
aiohttp.client_exceptions.ContentTypeError: 0, message='Attempt to decode JSON with unexpected mimetype: application/vnd.openxmlformats-officedocument.wordprocessingml.document', url=URL('https://www.googleapis.com/drive/v3/files/1N2ZEZd1PEKusdIg9aAYw0_ODmoxHoe5GugkTGZWP21Y/export?mimeType=application/vnd.openxmlformats-officedocument.wordprocessingml.document&alt=media')

During handling of the above exception, another exception occurred:
    response = await get_response(request)
  File "C:\Users\Vioshim\V-Bot3\sheltered-crag-49796\.venv\lib\site-packages\aiogoogle\sessions\aiohttp_session.py", line 155, in get_response
    response = await resolve_response(request, response)
  File "C:\Users\Vioshim\V-Bot3\sheltered-crag-49796\.venv\lib\site-packages\aiogoogle\sessions\aiohttp_session.py", line 72, in resolve_response
    data = await response.text()
  File "C:\Users\Vioshim\V-Bot3\sheltered-crag-49796\.venv\lib\site-packages\aiohttp\client_reqrep.py", line 1085, in text
    return self._body.decode(  # type: ignore[no-any-return,union-attr]
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xfc in position 10: invalid start byte

Answer 1 · 2022-05-09T20:17:40.000Z

Hey, I think this is error is caused because Aiohttp is trying to decode the body to UTF-8 text / json when the response is not UTF-8 decodable e.g. .docx or .pdf

As far as I remember, the drive.file.get() is a file download endpoint and not an endpoint that will return valid utf-8 bytes.

You should use either download_file or pipe_to to process the file download body. If it's a multipart response, you'll also get the json response, so u can check the mimetype like you're doing in the example you gave.

I hope this helps.

Answer 2 · 2022-05-09T23:52:05.000Z

Thanks for the heads up, It works, (had to use a small workaround for being able to use io.BytesIO) but it worked

class FileHandler:
    def __init__(self, fp: Optional[BytesIO] = None) -> None:
        if fp is None:
            fp = BytesIO()
        self.fp = fp

    async def write(self, data: bytes) -> int:
        self.fp.write(data)

    def __call__(self):
        self.fp.seek(0)
        return self.fp


async def docs_aioreader(document_id: str, aio: Aiogoogle):
    file = FileHandler()
    storage = await aio.discover("drive", "v3")
    info: dict[str, str] = await aio.as_service_account(storage.files.get(fileId=document_id))
    value = info.get("mimeType")
    if value == DOCX_FORMAT:
        query = storage.files.get(
            fileId=document_id,
            pipe_to=file,
            alt="media",
        )
    elif value == GOOGLE_FORMAT:
        query = storage.files.export(
            fileId=document_id,
            pipe_to=file,
            mimeType=DOCX_FORMAT,
            alt="media",
        )
    else:
        raise ValueError(f"{value} format is not supported.")
    await aio.as_service_account(query)
    return file()

Answer 3 · 2022-05-10T12:19:12.000Z

Awesome, glad I could help, and thanks for posting your solution!

I just remembered that we have an example usage of pipe_to here: https://github.com/omarryhan/aiogoogle/blob/master/examples/stream_drive_file.py

Do you think the docs/examples can be improved for the file download/upload functionality?

Also, I'm thinking we should probably throw a clearer error when someone attempts to use .export without providing a bytes handler i.e. either pipe_to or download_file. Wdyt?

Answer 4 · 2022-05-10T14:04:05.000Z

Greetings, it is a good idea to make examples that show more about the topic of exporting/downloading, and as well as having that exception.

At the same time, I'd suggest allowing the download handler to identify if an object's write method is a coro or not, that way users can provide objects whose write method isn't async, letting them work with those right off the bat, rather than relying on using workarounds or placeholder classes.

Example: If it's a coro, await, otherwise, execute the synchronous write methods in an asynchronous way, which can be detected with inspect.iscoroutine

Thanks for the help, looking forwards this project.

Answer 5 · 2022-05-10T19:20:55.000Z

Thanks for the suggestions!

Definitely agree with the export/downloading examples.

The reason why I haven't added support for sync data streaming is because there's already the official google lib that handles all the synchronous stuff very well. But I guess it won't harm to support normal synchronous functions as well as coroutines.

I'll go ahead and close this issue now. Feel free to open a new issue if you face any more problems. Thanks!