jingw/pyhdfs

Help me,please . The second run of the function in the script results in an abnormal result

Closed this issue · 2 comments

I am a rookie~~!!

The following code:

list_info = [{"tenant": "coco", "hive_path": "/user/open_001_dev", "ftp_path": "/files/prov/001"},
                 {"tenant": "lili", "hive_path": "/user/open_002_dev", "ftp_path": "/files/prov/002"}]
result = 0
client=pyhdfs.HdfsClient(hosts="10.173.5.18:9000",user_name="hdfs",timeout=10,max_tries=3,randomize_hosts="false")
def hive_content_size():
    global result
    for item in range(2):
        if "hive_path" in list_info[item]:
            print(client.get_content_summary(list_info[item]["hive_path"]))

hive_content_size()

The result of the first loop is output normally,but the output of the second loop is abnormal.

The bottom is the error report:

ContentSummary(directoryCount=1258, fileCount=3773, length=141829751002, quota=4000000, spaceConsumed=425489253006, spaceQuota=659706976665600)

Failed to reach to 10.173.5.18:9000 (attempt 3/3)
Traceback (most recent call last):
  File "/usr/local/python/lib/python3.9/site-packages/urllib3-1.26.4-py3.9.egg/urllib3/connectionpool.py", line 445, in _make_request
    six.raise_from(e, None)
  File "<string>", line 3, in raise_from
  File "/usr/local/python/lib/python3.9/site-packages/urllib3-1.26.4-py3.9.egg/urllib3/connectionpool.py", line 440, in _make_request
    httplib_response = conn.getresponse()
  File "/usr/local/python/lib/python3.9/http/client.py", line 1347, in getresponse
    response.begin()
  File "/usr/local/python/lib/python3.9/http/client.py", line 307, in begin
    version, status, reason = self._read_status()
  File "/usr/local/python/lib/python3.9/http/client.py", line 268, in _read_status
    line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
  File "/usr/local/python/lib/python3.9/socket.py", line 704, in readinto
    return self._sock.recv_into(b)
socket.timeout: timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/python/lib/python3.9/site-packages/requests-2.25.1-py3.9.egg/requests/adapters.py", line 439, in send
    resp = conn.urlopen(
  File "/usr/local/python/lib/python3.9/site-packages/urllib3-1.26.4-py3.9.egg/urllib3/connectionpool.py", line 755, in urlopen
    retries = retries.increment(
  File "/usr/local/python/lib/python3.9/site-packages/urllib3-1.26.4-py3.9.egg/urllib3/util/retry.py", line 532, in increment
    raise six.reraise(type(error), error, _stacktrace)
  File "/usr/local/python/lib/python3.9/site-packages/urllib3-1.26.4-py3.9.egg/urllib3/packages/six.py", line 735, in reraise
    raise value
  File "/usr/local/python/lib/python3.9/site-packages/urllib3-1.26.4-py3.9.egg/urllib3/connectionpool.py", line 699, in urlopen
    httplib_response = self._make_request(
  File "/usr/local/python/lib/python3.9/site-packages/urllib3-1.26.4-py3.9.egg/urllib3/connectionpool.py", line 447, in _make_request
    self._raise_timeout(err=e, url=url, timeout_value=read_timeout)
  File "/usr/local/python/lib/python3.9/site-packages/urllib3-1.26.4-py3.9.egg/urllib3/connectionpool.py", line 336, in _raise_timeout
    raise ReadTimeoutError(
urllib3.exceptions.ReadTimeoutError: HTTPConnectionPool(host='10.173.5.18', port=9000): Read timed out. (read timeout=10)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/python/lib/python3.9/site-packages/PyHDFS-0.3.1-py3.9.egg/pyhdfs/__init__.py", line 418, in _request
    response = self._requests_session.request(
  File "/usr/local/python/lib/python3.9/site-packages/requests-2.25.1-py3.9.egg/requests/sessions.py", line 542, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/local/python/lib/python3.9/site-packages/requests-2.25.1-py3.9.egg/requests/sessions.py", line 655, in send
    r = adapter.send(request, **kwargs)
  File "/usr/local/python/lib/python3.9/site-packages/requests-2.25.1-py3.9.egg/requests/adapters.py", line 529, in send
    raise ReadTimeout(e, request=request)
requests.exceptions.ReadTimeout: HTTPConnectionPool(host='10.162.3.171', port=19888): Read timed out. (read timeout=10)
Traceback (most recent call last):
  File "/home/hadoop/shay/monthly_report/test01.py", line 24, in <module>
    print(hive_content_size())
  File "/home/hadoop/shay/monthly_report/test01.py", line 22, in hive_content_size
    print(client.get_content_summary(list_info[item]["hive_path"]))
  File "/usr/local/python/lib/python3.9/site-packages/PyHDFS-0.3.1-py3.9.egg/pyhdfs/__init__.py", line 633, in get_content_summary
  File "/usr/local/python/lib/python3.9/site-packages/PyHDFS-0.3.1-py3.9.egg/pyhdfs/__init__.py", line 450, in _get
  File "/usr/local/python/lib/python3.9/site-packages/PyHDFS-0.3.1-py3.9.egg/pyhdfs/__init__.py", line 442, in _request
pyhdfs.HdfsNoServerException: Could not use any of the given hosts

ask for help~~!!!

jingw commented

requests.exceptions.ReadTimeout: HTTPConnectionPool(host='10.162.3.171', port=19888): Read timed out. (read timeout=10)

This part of the error means you set the timeout parameter too low. It's taking HDFS more than 10 seconds to reply to your request. Could you try increasing it?

Thank you for your reply.

I solved the problem in your way.

( ^_^ )