Unreliable channel_info method - connection pool URL changes
tulpn opened this issue · 0 comments
I have the following problem:
[Wed Jun 22 23:40:24.885910 2016] [wsgi:error] [pid 8753] [22/Jun/2016 23:40:24] DEBUG [syslog:295] Exception occured: __str__ returned non-string (type SysCallError)
[Wed Jun 22 23:40:24.886003 2016] [wsgi:error] [pid 8753] Traceback (most recent call last):
[Wed Jun 22 23:40:24.886700 2016] [wsgi:error] [pid 8753] File "/var/www/vhosts/api-beta.example.com/secure_api2.0/api/api/hosts/../../usercp/pushservice.py", line 290, in _check_if_occupied_inner
[Wed Jun 22 23:40:24.886738 2016] [wsgi:error] [pid 8753] channel_info = self.p.channel_info(channel_name)
[Wed Jun 22 23:40:24.886746 2016] [wsgi:error] [pid 8753] File "/home/user1/envs/secure_api/lib/python2.7/site-packages/pusher/http.py", line 22, in __call__
[Wed Jun 22 23:40:24.886757 2016] [wsgi:error] [pid 8753] return self.pusher.http.send_request(self.make_request(*args, **kwargs))
[Wed Jun 22 23:40:24.886763 2016] [wsgi:error] [pid 8753] File "/home/user1/envs/secure_api/lib/python2.7/site-packages/pusher/requests.py", line 37, in send_request
[Wed Jun 22 23:40:24.886771 2016] [wsgi:error] [pid 8753] **self.options
[Wed Jun 22 23:40:24.886776 2016] [wsgi:error] [pid 8753] File "/home/user1/envs/secure_api/lib/python2.7/site-packages/requests/sessions.py", line 465, in request
[Wed Jun 22 23:40:24.886786 2016] [wsgi:error] [pid 8753] resp = self.send(prep, **send_kwargs)
[Wed Jun 22 23:40:24.886791 2016] [wsgi:error] [pid 8753] File "/home/user1/envs/secure_api/lib/python2.7/site-packages/requests/sessions.py", line 573, in send
[Wed Jun 22 23:40:24.886799 2016] [wsgi:error] [pid 8753] r = adapter.send(request, **kwargs)
[Wed Jun 22 23:40:24.886805 2016] [wsgi:error] [pid 8753] File "/home/user1/envs/secure_api/lib/python2.7/site-packages/requests/adapters.py", line 370, in send
[Wed Jun 22 23:40:24.886824 2016] [wsgi:error] [pid 8753] timeout=timeout
[Wed Jun 22 23:40:24.886830 2016] [wsgi:error] [pid 8753] File "/home/user1/envs/secure_api/lib/python2.7/site-packages/requests/packages/urllib3/connectionpool.py", line 544, in urlopen
[Wed Jun 22 23:40:24.886840 2016] [wsgi:error] [pid 8753] body=body, headers=headers)
[Wed Jun 22 23:40:24.886846 2016] [wsgi:error] [pid 8753] File "/home/user1/envs/secure_api/lib/python2.7/site-packages/requests/packages/urllib3/connectionpool.py", line 344, in _make_request
[Wed Jun 22 23:40:24.886853 2016] [wsgi:error] [pid 8753] self._raise_timeout(err=e, url=url, timeout_value=conn.timeout)
[Wed Jun 22 23:40:24.886858 2016] [wsgi:error] [pid 8753] File "/home/user1/envs/secure_api/lib/python2.7/site-packages/requests/packages/urllib3/connectionpool.py", line 314, in _raise_timeout
[Wed Jun 22 23:40:24.886866 2016] [wsgi:error] [pid 8753] if 'timed out' in str(err) or 'did not complete (read)' in str(err): # Python 2.6
[Wed Jun 22 23:40:24.886879 2016] [wsgi:error] [pid 8753] TypeError: __str__ returned non-string (type SysCallError)
[Wed Jun 22 23:40:24.887028 2016] [wsgi:error] [pid 8753] [22/Jun/2016 23:40:24] DEBUG [syslog:298] Current Count is: 4
[Wed Jun 22 23:40:24.887176 2016] [wsgi:error] [pid 8753] [22/Jun/2016 23:40:24] DEBUG [syslog:306] Giving up - force: True
I am running:
Django 1.8.4,
python 2.7
virtualenv.
The code is executed in a WSGI instance from Apache.
Pip (8.1.2) Info:
pusher==1.3.0
ndg-httpsclient==0.4.1
pyOpenSSL==16.0.0
six==1.10.0
pyasn1==0.1.9
requests==2.10.0
urllib3==1.16
The source code I am executing is where the exception is thrown and handled by myself:
def _check_if_occupied(self, channel_name, count=0):
"""
Checks if the channel has any occupants, return bool
:param channel_name:
:return:
"""
result = False
try:
result = self._check_if_occupied_inner(channel_name, count)
except Exception:
pass
return result
def _check_if_occupied_inner(self, channel_name, count):
try:
channel_info = self.p.channel_info(channel_name)
logger.debug("Checking for Channel: %s" % channel_info)
if channel_info['occupied']:
return True
except Exception, e:
logger.debug("Exception occured: %s" % e.message)
traceback.print_exc()
count += 1
logger.debug("Current Count is: %s" % count)
if 0 < count <= 3:
self.p = None
time.sleep(5)
# create a new pusher instance
self._create_pusher_instance()
self._check_if_occupied(channel_name, count)
if count > 3:
logger.debug("Giving up - force: True")
return True
return False
I have a Class Based View in my Django REST API Framework view. This is a an excerpt from a helper class in utils.py.
self.p
holds a pusher instance. Before any triggers in other methods I check if actually anyone is in the channel I want to trigger. To do so, I check for the user's presence channel. (Most of the time, but it can also be a private- channel).
To check I use the channel_info
method. Unfortunately, I get the traceback that I have posted before. After some investigations it seems like that the host changes in the url coonectionpool and therefore an exception occurs.
Looking into the pusher debug console on the website shows me that no request was received either.
To put a cherry on top of this problem: It does not happen regularly, but during various times. I tried spamming it to see if there is some sort of triggerable race condition, but no. It does not matter if I call my method 30 times or 3 times. It can work and then suddenly jump to not working.
I thought that using a recursive method with a timeout would solve the problem (race condition) - but no. It throws 3 times the exception.
Then I tried to dispose the self.p instance and create a new one once the exception is thrown. I thought this could deal with some caching issues - but again no luck.
Update:
I tried using other backends, but they seem to fail. GAE seems way to complicated and overloaded for this issue. At least I did not see an easy way to use it. Tornado has "Future" problems. Channel Info doesn't work because instead I get a DummyFuture back which I would then need to process further.
aiohttp requires python 3.x - not an option at all.
I have also tried to using a cluster and specified "eu" as in the github docs - which to my luck now shows me the error trace from above ALL the time. Which could lead to the fact that the timeout support does not work correctly ?