Unable to donwload/export .docx files
Vioshim opened this issue · 5 comments
Description
Hello good morning, as stated by the title, the library seems to fail when attempting to export files to .docx while using drive v3.
This seems to occur due to aiohttp session's content type.
Example
import asyncio
import json
from aiogoogle import Aiogoogle
from aiogoogle.auth.creds import ServiceAccountCreds
service_account_key = json.load(open("client_service.json"))
creds = ServiceAccountCreds(
scopes=[
"https://www.googleapis.com/auth/documents",
"https://www.googleapis.com/auth/drive",
],
**service_account_key,
)
DOCX_FORMAT = "application/vnd.openxmlformats-officedocument.wordprocessingml.document"
GOOGLE_FORMAT = "application/vnd.google-apps.document"
URL = "1N2ZEZd1PEKusdIg9aAYw0_ODmoxHoe5GugkTGZWP21Y"
async def test():
async with Aiogoogle(service_account_creds=creds) as aiogoogle:
storage = await aiogoogle.discover("drive", "v3")
res = await aiogoogle.as_service_account(storage.files.get(fileId=URL))
if res["mimeType"] == DOCX_FORMAT:
query = storage.files.get(fileId=URL, alt="media")
elif res["mimeType"] == GOOGLE_FORMAT:
query = storage.files.export(fileId=URL, mimeType=DOCX_FORMAT, alt="media")
res = await aiogoogle.as_service_account(query)
return res
asyncio.run(test())
Expected behaviour
Obtaining the URL's data as .docx (be it by converting or downloading)
Actual result
Traceback (most recent call last):
File "C:\Users\Vioshim\V-Bot3\sheltered-crag-49796\.venv\lib\site-packages\aiogoogle\sessions\aiohttp_session.py", line 69, in resolve_response
json = await response.json()
File "C:\Users\Vioshim\V-Bot3\sheltered-crag-49796\.venv\lib\site-packages\aiohttp\client_reqrep.py", line 1103, in json
raise ContentTypeError(
aiohttp.client_exceptions.ContentTypeError: 0, message='Attempt to decode JSON with unexpected mimetype: application/vnd.openxmlformats-officedocument.wordprocessingml.document', url=URL('https://www.googleapis.com/drive/v3/files/1N2ZEZd1PEKusdIg9aAYw0_ODmoxHoe5GugkTGZWP21Y/export?mimeType=application/vnd.openxmlformats-officedocument.wordprocessingml.document&alt=media')
During handling of the above exception, another exception occurred:
response = await get_response(request)
File "C:\Users\Vioshim\V-Bot3\sheltered-crag-49796\.venv\lib\site-packages\aiogoogle\sessions\aiohttp_session.py", line 155, in get_response
response = await resolve_response(request, response)
File "C:\Users\Vioshim\V-Bot3\sheltered-crag-49796\.venv\lib\site-packages\aiogoogle\sessions\aiohttp_session.py", line 72, in resolve_response
data = await response.text()
File "C:\Users\Vioshim\V-Bot3\sheltered-crag-49796\.venv\lib\site-packages\aiohttp\client_reqrep.py", line 1085, in text
return self._body.decode( # type: ignore[no-any-return,union-attr]
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xfc in position 10: invalid start byte
Hey, I think this is error is caused because Aiohttp is trying to decode the body to UTF-8 text / json when the response is not UTF-8 decodable e.g. .docx or .pdf
As far as I remember, the drive.file.get() is a file download endpoint and not an endpoint that will return valid utf-8 bytes.
You should use either download_file
or pipe_to
to process the file download body. If it's a multipart response, you'll also get the json response, so u can check the mimetype like you're doing in the example you gave.
I hope this helps.
Thanks for the heads up, It works, (had to use a small workaround for being able to use io.BytesIO
) but it worked
class FileHandler:
def __init__(self, fp: Optional[BytesIO] = None) -> None:
if fp is None:
fp = BytesIO()
self.fp = fp
async def write(self, data: bytes) -> int:
self.fp.write(data)
def __call__(self):
self.fp.seek(0)
return self.fp
async def docs_aioreader(document_id: str, aio: Aiogoogle):
file = FileHandler()
storage = await aio.discover("drive", "v3")
info: dict[str, str] = await aio.as_service_account(storage.files.get(fileId=document_id))
value = info.get("mimeType")
if value == DOCX_FORMAT:
query = storage.files.get(
fileId=document_id,
pipe_to=file,
alt="media",
)
elif value == GOOGLE_FORMAT:
query = storage.files.export(
fileId=document_id,
pipe_to=file,
mimeType=DOCX_FORMAT,
alt="media",
)
else:
raise ValueError(f"{value} format is not supported.")
await aio.as_service_account(query)
return file()
Awesome, glad I could help, and thanks for posting your solution!
I just remembered that we have an example usage of pipe_to here: https://github.com/omarryhan/aiogoogle/blob/master/examples/stream_drive_file.py
Do you think the docs/examples can be improved for the file download/upload functionality?
Also, I'm thinking we should probably throw a clearer error when someone attempts to use .export without providing a bytes handler i.e. either pipe_to or download_file. Wdyt?
Greetings, it is a good idea to make examples that show more about the topic of exporting/downloading, and as well as having that exception.
At the same time, I'd suggest allowing the download handler to identify if an object's write method is a coro or not, that way users can provide objects whose write method isn't async, letting them work with those right off the bat, rather than relying on using workarounds or placeholder classes.
Example: If it's a coro, await, otherwise, execute the synchronous write methods in an asynchronous way, which can be detected with inspect.iscoroutine
Thanks for the help, looking forwards this project.
Thanks for the suggestions!
Definitely agree with the export/downloading examples.
The reason why I haven't added support for sync data streaming is because there's already the official google lib that handles all the synchronous stuff very well. But I guess it won't harm to support normal synchronous functions as well as coroutines.
I'll go ahead and close this issue now. Feel free to open a new issue if you face any more problems. Thanks!