Data volumes for persistence and connect to Hive
darrenhaken opened this issue · 2 comments
I'm new to the Hadoop stack so forgive me if I'm missing something obvious.
I had two requirements I'm trying to work out with this Docker image.
- how to persist hdfs to a data volume (is hdfs running?)
- how to connect another container running another part of the Hadoop stack i.e Hive.
Can anyone help?
Dear darrenhaken,
we face similar requirements and wonder if you were able to resolve yours.
If so, could you please point us towards the direction or steps that you chose to configure the Hive container to use the hdfs running within this Docker image?
Cheers, Patrick
Hi, I have same problem for using volume for my hdfs input/output.
I want to make directory by $HADOOP_PREFIX/bin/hadoop fs -mkdir mytest
and then put files on mytest/input and do something on them like wordcount
and I want to persist input and output data after each docker run!
How is it possible?
I made so far:
-
Added these codes to hdfs-site.xml:
<property> <name>dfs.datanode.data.dir</name> <value>file:///home/app/hdfs/datanode</value> <description>DataNode directory</description> </property>
-
Create a docker volume with name 'myvol'
-
Use -v for run image:
docker run -v myvol:/home/app -it c29b621ba74a /etc/bootstrap.sh -bash
But in /home/app directory there are just my created files by vi
command and another folder named 'hdfs', that not working to persist input/output data.