Query successed but caused RecursionError
Gedevan-Aleksizde opened this issue · 5 comments
Sometimes .query
cause the error like the following, but the corresponding job status is "success." I cannot find the reason but it tends to happen with a long time job, longer than a few hours.
Traceback (most recent call last):
File "/usr/lib/python3.8/multiprocessing/pool.py", line 125, in worker
result = (True, func(*args, **kwds))
File "test.py", line 10, in main
_ = tdcl.query(
File "/usr/local/lib/python3.8/dist-packages/pytd/client.py", line 245, in query
res = engine.execute(header + query, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/pytd/query_engine.py", line 96, in execute
self.executed = cur.execute(query)
File "/usr/local/lib/python3.8/dist-packages/tdclient/cursor.py", line 49, in execute
self._do_execute()
File "/usr/local/lib/python3.8/dist-packages/tdclient/cursor.py", line 82, in _do_execute
return self._do_execute()
File "/usr/local/lib/python3.8/dist-packages/tdclient/cursor.py", line 82, in _do_execute
return self._do_execute()
File "/usr/local/lib/python3.8/dist-packages/tdclient/cursor.py", line 82, in _do_execute
return self._do_execute()
[Previous line repeated 2954 more times]
File "/usr/local/lib/python3.8/dist-packages/tdclient/cursor.py", line 64, in _do_execute
status = self._api.job_status(self._executed)
File "/usr/local/lib/python3.8/dist-packages/tdclient/job_api.py", line 170, in job_status
with self.get(create_url("/v3/job/status/{job_id}", job_id=job_id)) as res:
File "/usr/local/lib/python3.8/dist-packages/tdclient/api.py", line 185, in get
response = self.send_request(
File "/usr/local/lib/python3.8/dist-packages/tdclient/api.py", line 499, in send_request
return self.http.request(
File "/usr/local/lib/python3.8/dist-packages/urllib3/request.py", line 66, in request
return self.request_encode_url(method, url, fields=fields,
File "/usr/local/lib/python3.8/dist-packages/urllib3/request.py", line 89, in request_encode_url
return self.urlopen(method, url, **extra_kw)
File "/usr/local/lib/python3.8/dist-packages/urllib3/poolmanager.py", line 324, in urlopen
response = conn.urlopen(method, u.request_uri, **kw)
File "/usr/local/lib/python3.8/dist-packages/urllib3/connectionpool.py", line 597, in urlopen
httplib_response = self._make_request(conn, method, url,
File "/usr/local/lib/python3.8/dist-packages/urllib3/connectionpool.py", line 384, in _make_request
six.raise_from(e, None)
File "<string>", line 2, in raise_from
File "/usr/local/lib/python3.8/dist-packages/urllib3/connectionpool.py", line 380, in _make_request
httplib_response = conn.getresponse()
File "/usr/lib/python3.8/http/client.py", line 1347, in getresponse
response.begin()
File "/usr/lib/python3.8/http/client.py", line 331, in begin
self.headers = self.msg = parse_headers(self.fp)
File "/usr/lib/python3.8/http/client.py", line 225, in parse_headers
return email.parser.Parser(_class=_class).parsestr(hstring)
File "/usr/lib/python3.8/email/parser.py", line 67, in parsestr
return self.parse(StringIO(text), headersonly=headersonly)
File "/usr/lib/python3.8/email/parser.py", line 56, in parse
feedparser.feed(data)
File "/usr/lib/python3.8/email/feedparser.py", line 176, in feed
self._call_parse()
File "/usr/lib/python3.8/email/feedparser.py", line 180, in _call_parse
self._parse()
File "/usr/lib/python3.8/email/feedparser.py", line 295, in _parsegen
if self._cur.get_content_maintype() == 'message':
File "/usr/lib/python3.8/email/message.py", line 594, in get_content_maintype
ctype = self.get_content_type()
File "/usr/lib/python3.8/email/message.py", line 578, in get_content_type
value = self.get('content-type', missing)
File "/usr/lib/python3.8/email/message.py", line 471, in get
return self.policy.header_fetch_parse(k, v)
File "/usr/lib/python3.8/email/_policybase.py", line 316, in header_fetch_parse
return self._sanitize_header(name, value)
File "/usr/lib/python3.8/email/_policybase.py", line 287, in _sanitize_header
if _has_surrogates(value):
File "/usr/lib/python3.8/email/utils.py", line 57, in _has_surrogates
s.encode()
RecursionError: maximum recursion depth exceeded while calling a Python object
@Gedevan-Aleksizde Thank you for reporting the issue.
May I request you to try querying with a larger value of wait_interval
?
# 1800sec = 30min, for example
client.query('select symbol, count(1) as cnt from nasdaq group by 1 order by 1', wait_interval=1800)
I'm not sure how long your job actually takes, but any numbers should work as long as it's reasonably smaller than the running time of the job.
Reason
In pytd.Client#query
, what happens behind the scene is to recursively fetch job status until the job finishes, and the time interval of fetching status is defined by the wait_interval
parameter: time.sleep(wait_interval)
Here, since the default value of wait_interval
is 5
sec, long-running jobs may cause too much recursive calls as you pointed out, and the code eventually fails unless we explicitly set a larger interval and decreases the number of recursive calls.
@Gedevan-Aleksizde I briefly tried to research the related issues, but couldn't find the cause with limited stack-trace. It might be an internet connection issue.
https://stackoverflow.com/questions/60432826/sudden-error-after-working-well-for-hours-recursion-error-maximum-recursion-dep
As takuti recommended, can you try to bump wait_interval
?
Thank you. I confirmed the aforementioned error doesn't occur with a large wait_interval
value (I tested =3600
). But do I need estimate the elapsed time of the job and specify the proper value manually (i.e., small value for small job)?
@Gedevan-Aleksizde Sorry for the late reply. Good to hear that larger wait_interval
helps.
But do I need estimate the elapsed time of the job and specify the proper value manually (i.e., small value for small job)?
Yes. Unfortunately, this is one of the best approaches users can take to ensure a successful execution.
Meanwhile, considering the error happened due to excessive recursive calls, manually checking & updating system's recursion limit may help if you come up with a reasonable value.
import sys
# 1000 by default.
# In case `wait_iterval=5`, 1000 * 5 = 5000sec (~83.3min) will be max job duration the script can wait.
sys.getrecursionlimit()
# You can change the limit.
sys.setrecursionlimit(2000)
Thank you. This kind of failures are rare and it seems to be often caused from queries with somehow inefficiency. I will try to tackle that with your solution and writing queries carefully.