Segfault When Initiating HDFS Connection
lucashu1 opened this issue · 26 comments
Hi,
I'm getting a segfault when trying to create a connection using the HDFileSystem constructor. The code that I'm running is:
from hdfs3 import HDFileSystem
hdfs = HDFileSystem([HOSTNAME], port=[PORT-NUM])
(with HOSTNAME
and PORT-NUM
filled in, ofc.)
When I run the script through GDB (for debugging purposes), I get:
Starting program: /home/jovyan/.conda/bin/python hdfs-test.py
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Program received signal SIGSEGV, Segmentation fault.
strlen () at ../sysdeps/x86_64/strlen.S:106
106 ../sysdeps/x86_64/strlen.S: No such file or directory.
(gdb) backtrace
#0 strlen () at ../sysdeps/x86_64/strlen.S:106
#1 0x00007ffff3ff83ee in std::char_traits<char>::length (__s=0x1 <error: Cannot access memory at address 0x1>)
from /home/jovyan/.conda/lib/python3.6/lib-dynload/../../libhdfs3.so
#2 std::string::assign (__s=0x1 <error: Cannot access memory at address 0x1>, this=0x555555c27fa0)
at /feedstock_root/build_artefacts/libhdfs3_1526441785297/work/libhdfs3/src/client/FileSystem.cpp:1131
#3 std::string::operator= (__s=0x1 <error: Cannot access memory at address 0x1>, this=0x555555c27fa0)
at /feedstock_root/build_artefacts/libhdfs3_1526441785297/work/libhdfs3/src/client/FileSystem.cpp:555
#4 Hdfs::FileSystem::FileSystem (this=0x555555c27fa0, conf=..., euser=0x1 <error: Cannot access memory at address 0x1>)
at /feedstock_root/build_artefacts/libhdfs3_1526441785297/work/libhdfs3/src/client/FileSystem.cpp:148
#5 0x00007ffff400f117 in hdfsBuilderConnect (bld=0x555555bf8c40,
effective_user=0x1 <error: Cannot access memory at address 0x1>)
from /home/jovyan/.conda/lib/python3.6/lib-dynload/../../libhdfs3.so
#6 0x00007ffff6607ec0 in ffi_call_unix64 () from /home/jovyan/.conda/lib/python3.6/lib-dynload/../../libffi.so.6
#7 0x00007ffff660787d in ffi_call () from /home/jovyan/.conda/lib/python3.6/lib-dynload/../../libffi.so.6
#8 0x00007ffff681cdee in _ctypes_callproc ()
from /home/jovyan/.conda/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so
#9 0x00007ffff681d825 in PyCFuncPtr_call ()
from /home/jovyan/.conda/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so
#10 0x00005555556631bb in _PyObject_FastCallDict ()
#11 0x00005555556f0d3e in call_function ()
#12 0x000055555571519a in _PyEval_EvalFrameDefault ()
#13 0x00005555556ea7db in fast_function ()
#14 0x00005555556f0cc5 in call_function ()
#15 0x000055555571519a in _PyEval_EvalFrameDefault ()
#16 0x00005555556e99a6 in _PyEval_EvalCodeWithName ()
#17 0x00005555556eb108 in _PyFunction_FastCallDict ()
#18 0x000055555566339f in _PyObject_FastCallDict ()
#19 0x0000555555667ff3 in _PyObject_Call_Prepend ()
#20 0x0000555555662dde in PyObject_Call ()
#21 0x00005555556bdf6b in slot_tp_init ()
#22 0x00005555556f0f27 in type_call ()
#23 0x00005555556631bb in _PyObject_FastCallDict ()
#24 0x00005555556eacfa in _PyObject_FastCallKeywords ()
#25 0x00005555556f0d3e in call_function ()
#26 0x0000555555715eb1 in _PyEval_EvalFrameDefault ()
#27 0x00005555556eb529 in PyEval_EvalCodeEx ()
#28 0x00005555556ec2cc in PyEval_EvalCode ()
#29 0x0000555555768af4 in run_mod ()
#30 0x0000555555768ef1 in PyRun_FileExFlags ()
#31 0x00005555557690f4 in PyRun_SimpleFileExFlags ()
#32 0x000055555576cc28 in Py_Main ()
#33 0x000055555563471e in main ()
I installed hdfs3 via conda (conda install -c conda-forge hdfs3
).
Here are the package versions installed by conda-forge:
boost-cpp: 1.66.0-1 conda-forge
bzip2: 1.0.6-1 conda-forge
curl: 7.59.0-1 conda-forge
hdfs3: 0.3.0-py36_0 conda-forge
icu: 58.2-0 conda-forge
krb5: 1.14.6-0 conda-forge
libgcrypt: 1.8.2-hfc679d8_1 conda-forge
libgpg-error: 1.31-hf484d3e_0 conda-forge
libgsasl: 1.8.0-2 conda-forge
libhdfs3: 2.3.0-2 conda-forge
libiconv: 1.15-0 conda-forge
libntlm: 1.4-1 conda-forge
libprotobuf: 3.5.2-0 conda-forge
libssh2: 1.8.0-2 conda-forge
libuuid: 1.0.3-1 conda-forge
libxml2: 2.9.8-0 conda-forge
I don't think there's anything wrong with the Hadoop setup, since I'm still able to access it using hadoop fs
commands.
Any thoughts on what could be going wrong?
libhdfs3-2.3.0-2 was just released, can you try with libhdfs3-2.3.0-1 and see if this a new problem?
conda install -c conda-forge libhdfs3=2.3.0=1
@sk1p , ideas?
@martindurant that worked! Ran this in a Jupyter Notebook and things are working again: ! conda install -c conda-forge libhdfs3=2.3.0=1 hdfs3 --yes
. Hopefully they're able to get that fixed before the issue pops up for more people.
Thanks for testing, @lucashu1
Thanks for tagging me. That's weird: effective_user=0x1
- do you have anything related to effective_user
in your configuration file?
@sk1p Nope, not seeing any instances of effective_user
in /etc/hadoop
, which to my understanding should contain the Hadoop config files.
If I remember, effective_user arises in situations something like you have logged in with a kerberos principal other than your local username. Are you using kerberos?
@martindurant did anything change wrt. kerberos library dependency versions in -2?
Since the last release was 9 months ago, a whole load of versions have been updated, but the latest build release was with the older code and newer versions, so that is likely not the problem. I think that there must still be bad code from the big merge ContinuumIO/libhdfs3-downstream#3 which is beyond my c++ ability. You'll notice that there were conversations around how to make that right, but it was never used in a conda build until now (the recipe still pointed to the concat
branch)
Ok, that may be it. Sadly I couldn't reproduce it yet, I don't have any kerberos stuff set up.
Maybe @bdrosen96 knows more?
Presumably, yes, but getting this right for a variety of systems is not easy.
I got something the same issue with a manual install, on master.
Here are the complete steps to build latest libhdfs3 master (commit 41fd50d from May 7, PR#12) on fedora 28 and reproduce this (I don't believe an actual kerberos connection is needed, just the standard hdfs-client with auth set to kerberos):
dnf install -y krb5-workstation protobuf-devel gtest-devel gmock-devel libcurl-devel libgsasl-devel libgasl
git clone https://github.com/ContinuumIO/libhdfs3-downstream
cd libhdfs3-downstream/libhdfs3
mkdir build
cd build
../bootstrap --prefix=/usr/local
make -j5
sudo make install
export LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH
pip3 install --user hdfs3 # 0.3.0
gdb python3
import hdfs3 ; f = hdfs3.HDFileSystem('some.net', port=8020)
Thanks to @martindurant I was able to work around it by building the following libhdfs3 commit:
git checkout 7842951 # august
git cherry-pick 069d774 # gcc-7
(then same steps as above, it should at least get you to point where it tries to connect and fails with ERROR Failed to setup RPC connection to "some.net:8020"
)
I could not reproduce this when I used a debug build. That may mean there is unitialized variable
It is probably important to be sure the python hdfs3 code is using:
hdfsBuilderConnect.argtypes = [ct.POINTER(hdfsBuilder), ct.c_char_p]
I also could not reproduce this with non debug build. I noticed the connect call looks like:
fs = _lib.hdfsBuilderConnect(o)
instead of:
fs = _lib.hdfsBuilderConnect(o, ensure_bytes(self.effective_user))
so there might be incompatability between python lib and C++ lib
Here is the line from the python side, having the extra argument https://github.com/dask/hdfs3/blob/master/hdfs3/core.py#L147 , but this is not in the released conda/pip package.
might be why there are issues if people are using older pip package without those changes.
I don't think so, the libhdfs3 conda package pull from my repo. not from ContinuumIO's for this reason; I could not get a build of libhdfs3 that worked reliably on the various systems, and that's why hdfs3 has remained unreleased since. It would be great, but apparently beyond my ability without considerable investment of time.
Since arrow's libhdfs implementation now does most of the job, there is less incentive to push on here.
Do you get same error if you use the unreleased version from github instead of pypi? By older I meant the version that was out of sync with C++ changes.
conda install -c conda-forge libhdfs3=2.3.0=1 hdfs3 --yes
fails to install proper version of libcrypto.so.1.0.0
this shows up in ldd $minicondda_home/lib/libhdfs3.so
i have tried to use libcrypto.1.1 but that leads to segmentation fault
what a mess!!
i had to find library libcrypto.1.0.0 online and install it ( conda install openssl=1.0.2)
hdfs3 and libhdfs3 i had installed normally. ( conda install libhdfs3 hdfs3 -c conda-forge)
I think its very commandable that continuum tries to tackle these integration issues. lack of progress in them is sad
Sorry, @petacube - the build difficulties here are indeed what led to a stop in the efforts on our part to try to get everything smooth. Arrow's hdfs does not suffer from them, so long as you have the java-native libraries present (which is normally true on any hdfs edge-node). For connection from outside the cluster, I would not recommend my implementation of webhdfs or one of the other python implementations out there.
@martindurant i see..
so the webhdfs requires hdfs admin to enable/install webhdfs interface on hdfs machines right? or can it be used against plain hdfs cluster without any changes?
Right, webhdfs needs to be enabled, and may need its own kerberos principal, etc. That's fairly common, though. You probably do not need direct access to the data nodes, if a proxy has been suitably setup - but there are many complicated options there.
can it be installed in ubuntu 18.0.4..if so how? ty
@HarshaRagyari , have you tried pip? Generally I would recommend basing your whole python environment on conda
, but I understand there are a range of user cases. A debian package called python-fsspec
exists, that might be available on ubuntu and might be recent enough for you - I don't know the details on that.