sybrenstuvel/flickrapi

502 error printout breaks console interface

newpro opened this issue · 9 comments

Hey @sybrenstuvel
Thanks so much for the repo! It really saves me a lot of time in computer vision research.

The flickr server sometimes gives 502 error, even through it is very rare. My strategy currently include catch the error, and do an exponential backoff, wait for the flickr server to recover. The strategy works very well, however, in some cases, the library print out 502 error message payload, which is 502 webpage, and break the console, most likely special characters causes the program into memory space that should not be accessed. The program seems to be still running and collecting data, however, can not print further messages to monitor the progress. I attached a screenshot of the symptoms for reference.

If I may provide some recommendations to fix the issue, the easy way would be disabling print out the payload, or specifically filter out special characters that cause the console to break.

Thanks again!

Head of the message:
screenshot from 2018-08-10 13-36-34

Tail of the message:

screenshot from 2018-08-10 13-22-20

special characters causes the program into memory space that should not be accessed

There is no such thing as "special characters". If you're dealing with text, your software should know the encoding it is in and handle that properly. Just assuming it's a single-byte encoding is a bad idea, especially since the Flickr API documentation pretty much screams that everything is UTF-8. Ignoring character encoding will always turn around to bite you.

If I may provide some recommendations to fix the issue, the easy way would be disabling print out the payload, or specifically filter out special characters that cause the console to break.

AFAIK the library doesn't print() anything. All logging output goes via Python's logging module, which you can configure in your application. You can make it completely quiet, log to automatically rotated logfiles, and more.

Hey @sybrenstuvel
Thank for the quick reply! I really appreciated.

I did some further digging, I still think there is a program logging problem in the repo code, specificly in this line. First, let me say that I did not feed non-UTF text into the API interface. The issue is in response payload from Flickr. With that in mind, I believe the possible error is not lie within logging, but "urllib_parse.unquote". Let me explain with a fun experiment:

  • Can "urllib_parse.unquote" deal with non-UTF code? Answer: yes!
  • Can "logging" deal with non-UTF code? Answer yes!
  • Can "logging" + "unquote" deal with non-UTF code? Answer: NO!!! It will break the console!

Here are the experiment:
screenshot from 2018-08-13 14-59-23

Cheers!

Please don't screenshot your code. Just use Markdown to format it properly. That will allow me to copy-paste whatever you did and try it myself, instead of having to type everything myself.

Your use of the urlparse module indicates you're indeed using Python 2. What is your reason to stick to that ancient version? It's horrible when it comes to character encoding, and as a result I see mistakes even in your latest experiment (you're talking about u'\xc3' and '\xc3' as the same thing; they aren't).

hey @sybrenstuvel

Yeah you are right. This is an issue relative to python2. However, my original screenshot is running within python3.5. I was doing a quick test with my laptop on my way out when I submit the last post, so the issue is still there, just i did not get the right one.

I dig a bit further and try to replicate the issue. So the problem is about display this page. However, I tried to google the specific html code for this page trying to load the webpage again, I failed to find any. And also because the server issue are rare, I can not replicate it by send request.

However, I looked into it, and believe that it breaks the code when it is at displaying Korea. So I downloaded a html source code of offical Korea Tourism website to get some Korea byte string. Now we can successfully locate the issue:

import logging
from urllib import parse as urllib_parse
# the following line should freeze your console, or python interface, if not let me know
logging.error(urllib_parse.unquote('\xeb\xac\xb4\xeb\x8b\xa8\xec\x88\x98\xec\xa7\x91\xea\xb1\xb0\xeb\xb6\x80</a></li>\n'))

Some info I hope it helps...

  1. On occasion I get the 'bad panda' 502 error... mostly under heavy load. I have logging enabled to file and console and have not noticed this console locking issue you mention. I use both python 2.7 and python 3.6 with unicode.
  2. I've quickly tried your sample code on Windows bash with python 3.4 (will try it on Linux later on) and console seems not to lock.
$ python3.4
Python 3.4.3 (default, Nov 17 2016, 01:08:31)
[GCC 4.8.4] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import logging
>>> from urllib import parse as urllib_parse
>>> logging.error(urllib_parse.unquote('\xeb\xac\xb4\xeb\x8b\xa8\xec\x88\x98\xec\xa7\x91\xea\xb1\xb0\xeb\xb6\x80</a></li>\n'))
ERROR:root:무���거�</a></li>

>>> print('still here')
still here
>>>

hope it helps

@oPromessa

I am using linux 16.04 LTS, python 3.6. I guess it may contribute to the current program stack in memory, and the OS ability to stop program reading into, or stream out to invalid memory. The code breaks in mine, screenshot:
screenshot from 2018-08-14 17-42-26

The issue can be resolved in my system, by decode to UTF-8 before pass into unquote, e.g.,

logging.error(urllib_parse.unquote(b'\xeb\xac\xb4\xeb\x8b\xa8\xec\x88\x98\xec\xa7\x91\xea\xb1\xb0\xeb\xb6\x80</a></li>\n'.decode("utf-8", "strict")))

Observe:
image

Why are you unquoting a string that clearly isn't URL-encoded at all?

hey @sybrenstuvel

I got confused about that part 2. If the error is generated at this line, it is a mistake to use unquote function. The function should parse a url string, not request text.

@newpro just trying to help out. Would you mind going back to the beginning? I have a wild guess that the console/shell might not have the appropriate locale settings and may be getting confused!

  1. Can you share the environment variables on the shell which launches your app? I'm guessing some LANG/Collation related settings may be the cause of the conflict.
  2. Could you link to your code where you set the logging and where you get this situation.
  • Side notes 1)...
    • I was forced on my app launch shell to set things like this to cover my bases.
# I've used this setting to allow support for international characters in
# folders and file names
export LC_ALL=en_US.utf8
export LANG=en_US.utf8
  • Side notes 2)
    • My train of thought is that with incorrect locale you get different outputs...
$ echo $LANG
en_US.UTF-8
$ find . -type d
.
./Test Photo Library/Várias Pics
$ LANG=en_US find . -type d
.
./Test Photo Library/V??rias Pics
$