ImperialCollegeLondon/django-drf-filepond

Fetch endpoint fails with Unsupported UTF-8 sequence length when encoding string

Closed this issue · 10 comments

I'm not sure why, but my http://localhost/fp/fetch/?target=https%3A%2F%2Fcdn.cnn.com%2Fcnnnext%2Fdam%2Fassets%2F190410094953-india-waterfalls---athirappalli-waterfalls.jpg request gets OverflowError: Unsupported UTF-8 sequence length when encoding string error.

Full stack trace:

ERROR Internal Server Error: /fp/fetch/
Traceback (most recent call last):
  File "/var/www/proj_name/venv/lib/python3.6/site-packages/django/core/handlers/exception.py", line 34, in inner
    response = get_response(request)
  File "/var/www/proj_name/venv/lib/python3.6/site-packages/django/core/handlers/base.py", line 156, in _get_response
    response = self.process_exception_by_middleware(e, request)
  File "/var/www/proj_name/venv/lib/python3.6/site-packages/django/core/handlers/base.py", line 154, in _get_response
    response = response.render()
  File "/var/www/proj_name/venv/lib/python3.6/site-packages/django/template/response.py", line 106, in render
    self.content = self.rendered_content
  File "/var/www/proj_name/venv/lib/python3.6/site-packages/rest_framework/response.py", line 72, in rendered_content
    ret = renderer.render(self.data, accepted_media_type, context)
  File "/var/www/proj_name/venv/lib/python3.6/site-packages/drf_ujson/renderers.py", line 24, in render
    ret = ujson.dumps(data, ensure_ascii=self.ensure_ascii)
OverflowError: Unsupported UTF-8 sequence length when encoding string
ERROR "GET /fp/fetch/?target=https%3A%2F%2Fcdn.cnn.com%2Fcnnnext%2Fdam%2Fassets%2F190410094953-india-waterfalls---athirappalli-waterfalls.jpg HTTP/1.1" 500 114731

data in this case is bytes type, so it crashes at this point.

Thanks for reporting this, it looks like there is clearly a problem here, however I'm seeing a slightly different (but presumably related) error:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte
  • Python 3.6.9
  • Django 2.2.6
  • djangorestframework 3.11.0

Your error seems to be coming from the drf_ujson module which I don't seem to have installed. Can you confirm your dependency versions?

EDIT: Just to confirm that if I install the drf_ujson module and then add a DEFAULT_RENDER_CLASSES setting to my test app's settings.py file, then I get a six error which I think is a Django 3-related issue - I then downgraded djangorestframework to a pre-Django3 version (I tried 3.9.4) and I can then replicate the error you're getting.

So I had:
Python 3.6.8
Django 2.1.10
djangorestframework 3.9.4

After update to latest DRF and Django, and a new fork of drf_ujson2== 1.4.1, issue still persists.
And yes it looks like drf_ujson2 is the culprit.

I've managed to escape this issue by changing response = Response(buf.getvalue(), status=status.HTTP_200_OK, content_type=content_type) to response = HttpResponse(buf.getvalue(), content_type=content_type) in your library.

I will investigate further...

Switching to JSONRenderer has the same effect. UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte

I think the solution is either to use HttpResponse() (related) as I did in my first attempt to fix this, or create custom DRF renderers for each file type.

Does the fetch call work on your setup?

Ok I found a better solution. Thing is that it was just enough to create a custom binary renderer:

class BinaryFileRenderer(BaseRenderer):
    media_type = 'application/octet-stream'
    format = None
    charset = None
    render_style = 'binary'

    def render(self, data, media_type=None, renderer_context=None):
        return data

and then in the view define renderer_classes = [BinaryFileRenderer, ]

This works perfectly for me. I will create a pull request if you wish.

Thanks for investigating. This looks like the ideal way to resolve the issue in the context of DRF responses. However, I wonder if your previous comment might actually be a more optimal solution:

I've managed to escape this issue by changing response = Response(buf.getvalue(), status=status.HTTP_200_OK, content_type=content_type) to response = HttpResponse(buf.getvalue(), content_type=content_type) in your library.

Given that all we're trying to do here is pass through the data that is coming from the remote URL, there's actually no need for it to be processed as a DRF response. Indeed, if we take the approach of using DRF renderers, then we need to ensure that all possible data types are accounted for (although I assume that generic text and binary renderers will probably suffice).

As per the discussion here, I think returning a standard Django HTTP response is possibly a better option? I think that was the original aim that I had in returning the generic Response object but it looks like this should have been an HttpResponse instead.

Honestly I have no idea :)
I have both methods including the last one working:
pasevin@25bfc81

It feels that django-drf-filepond library, should use DRF for all possible responses etc :)
So far I didn't have problems with generic binary solution. It's your call :)

Just in case you decide to go with BinaryFileRenderer, I submitted a PR.

Thanks, since you've set this up, it would be nice to include it, however, thinking about whether this is the best approach, what we want is just a complete pass-through of the file data fetched from the remote URL.

I'm not clear if letting DRF read the data and then re-output it through a renderer could result in any changes to the incoming data - at present I'm inclined to think just using a plain HttpResponse might be a more general solution and one that is less likely to result in issues in the future.

I take your point made above:

It feels that django-drf-filepond library, should use DRF for all possible responses etc :)

but I do think in this particular case that using an HttpResponse is cleaner. Just doing some further tests to check this. Apologies, but if you want to modify your PR to use the simpler approach you described above, then happy to pull that in.

Done ;)