slank/awsgi

Error with sending excel file with send_file

chattr1n opened this issue · 5 comments

Hello everyone,

I am getting errors from my code below:

output = BytesIO()
writer = pd.ExcelWriter(output, engine='xlsxwriter')  
df.to_excel(writer, startrow = 0, merge_cells = False, sheet_name = "results", index=False)
workbook  = writer.book
worksheet = writer.sheets["results"]    
worksheet.set_column(1, 20, 20)           
writer.save()          
output.seek(0)
return send_file(output, attachment_filename="test.xlsx", as_attachment=True)

The error message is

'utf-8' codec can't decode byte 0xb6 in position 14: invalid start byte: UnicodeDecodeError
Traceback (most recent call last):
File "/var/task/handler.py", line 60, in run
return awsgi.response(app, event, context)
File "/var/task/awsgi/__init__.py", line 22, in response
return sr.response(output)
File "/var/task/awsgi/__init__.py", line 41, in response
'body': self.body.getvalue() + ''.join(map(convert_str, output)),
File "/var/task/awsgi/__init__.py", line 9, in convert_str
return s.decode('utf-8') if isinstance(s, bytes) else s
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb6 in position 14: invalid start byte

With some research, I found the file __ init __.py line 9 should be updated

from:
return s.decode('utf-8') if isinstance(s, bytes) else s
to:
return s.decode('utf-8', 'ignore') if isinstance(s, bytes) else s

Tested the code above and it's now working correctly. The excel file is being download as expected.

May I make a pull request and submit this change?

Thank you

slank commented

Hi, @chattr1n. Have you tried passing encoding='utf8' to df.to_excel()?

If that doesn't help, let's keep talking. I'm not sure that ignoring decoding errors is the right thing to do here.

Hi, Thanks for the reply. You are right that s.decode('utf-8', 'ignore') can also causes issues. It doesn't work every time. The downloaded excel was corrupted.

I also tried your suggestion and unfortunately it doesn't work.

For now I have to jump on something else but I can come back and take closer a look at this in few weeks.

New versions have base64_content_types that will send as a binary and avoid this.

Maybe also switch to binary automatically for other things (eg, file downloads)?

With #31 (0.2.0), a .use_binary_response() got added to handle the "when to use a base64 response" policy, so this should be relatively easy to implement.

Hello guys, I'm not a Python expert but I was able to make it work with the following code. In my case, I had some troubles downloading a .docx file

It redefines the method build_body of StartResponse and creates the new function byte_convert_str

import itertools
import base64
def byte_convert_str(s):
    return base64.b64encode(s)

def new_build_body(self, headers, output):
    totalbody = b''.join(itertools.chain(
        self.chunks, output,
    ))

    is_b64 = self.use_binary_response(headers, totalbody)

    if is_b64:
        converted_output = awsgi.convert_b46(totalbody)
    else:
        if not headers.get('Content-Type').startswith('application/octet-stream'):
            converted_output = awsgi.convert_str(totalbody)
        else:
            converted_output = byte_convert_str(totalbody)
            is_b64 = True

    return {
        'isBase64Encoded': is_b64,
        'body': converted_output,
    }

awsgi.StartResponse.build_body = new_build_body

EXPLANATION
I added the function byte_convert_str(s)

def byte_convert_str(s):
    return base64.b64encode(s)

And also the following code. It checks if the Content-Type is application/octet-stream (Change as you need) then base64 the output and set to True the var is_b64

        if not headers.get('Content-Type').startswith('application/octet-stream'):
            converted_output = awsgi.convert_str(totalbody)
        else:
            converted_output = byte_convert_str(totalbody)
            is_b64 = True

I hope it helps :)