dask/hdfs3

Read fs.default.name settings from core-site.xml

nsemichev opened this issue · 3 comments

I configured HDFS Client on Ubuntu 16.04 and I can successfully run this command:

hdfs --config /etc/hadoop/conf/ dfs -ls /

The config parameter takes the file core-site.xml from etc/hadoop/conf/

From core-site.xml

<property>
    <name>fs.default.name</name>
    <value>igfs://igfs@10.200.10.1:10500</value>
  </property>

Is there a way to configure the Python HDFS library using the above section from core-site.xml?

>>> from hdfs3 import HDFileSystem
>>> hdfs = HDFileSystem(host='10.200.10.1', port=10500)

When I run this code, I get the following error message:
ConnectionError: Connection Failed: HdfsRpcException: Failed to invoke RPC call "getFsStats" on server "10.200.10.1:10500"

The file-system name suggests you are using IGFS (Apache Ignite) as opposed to HDFS. As far as I understand, Ignite is a layer over HDFS and libhdfs3 (the low-level library behind hdfs3) is not expected to be able to use it; do you also have a hdfs-site.xml?

Yes, I am using IGFS, which should be compatible with HDFS. I am not sure why libhdfs3 wouldn't work with IGFS, since core-site.xml contains the parameters needed to be able to use Ignite. I also have hdfs-site.xml, however, I haven't really used it, and the only property in the file is dfs.replication.

The error message says to me that libhdfs3 indeed is talking to your igfs server on the correct port, but is not getting a kind of conversation that it can deal with. I cannot find any mention of libhdfs3 being used with ignite, so my guess is that it is incompatible. You could try asking with libhdfs3 people, but there are several repos around in various states of un-maintenance (hdfs3 depends on the one from pivotal).