A contents manager for Jupyter that uses the Hadoop File System (HDFS) to store Notebooks and files
- We assume you already have a running Hadoop Cluster and Jupyter
- Set the JAVA_HOME and HADOOP_HOME environment variables
- In some cases you also need to set the CLASSPATH
export CLASSPATH=`$HADOOP_HOME/bin/hadoop classpath --glob`
- Install HDFSContents Manager. This will also install dependencies such as Pydoop
pip install hdfscontents
- Configure and run Jupyter Notebook.
You can either use command line arguments to configure Jupyter to use the HDFSContentsManager class and set HDFS related configurations
jupyter-notebook --NotebookApp.contents_manager_class='hdfscontents.hdfsmanager.HDFSContentsManager' \ --NotebookApp.ip='*' \ --HDFSContentsManager.hdfs_namenode_host='localhost' \ --HDFSContentsManager.hdfs_namenode_port=9000 \ --HDFSContentsManager.hdfs_user='myuser' \ --HDFSContentsManager.root_dir='/user/myuser/'
Alternatively, first run:
jupyter-notebook --generate-config
to generate a default config file. Edit and add the HDFS related configurations in the generated file. Then start the notebook server.
jupyter-notebook