/hdfscontents

A HDFS-backed ContentsManager implementation for IPython

Primary LanguagePythonApache License 2.0Apache-2.0

HDFS Contents Manager for Jupyter Notebooks

A contents manager for Jupyter that uses the Hadoop File System (HDFS) to store Notebooks and files

Getting Started

  1. We assume you already have a running Hadoop Cluster and Jupyter
  2. Set the JAVA_HOME and HADOOP_HOME environment variables
  3. In some cases you also need to set the CLASSPATH
export CLASSPATH=`$HADOOP_HOME/bin/hadoop classpath --glob`
  1. Install HDFSContents Manager. This will also install dependencies such as Pydoop
pip install hdfscontents
  1. Configure and run Jupyter Notebook.

You can either use command line arguments to configure Jupyter to use the HDFSContentsManager class and set HDFS related configurations

jupyter-notebook --NotebookApp.contents_manager_class='hdfscontents.hdfsmanager.HDFSContentsManager' \
      --NotebookApp.ip='*' \
      --HDFSContentsManager.hdfs_namenode_host='localhost' \
      --HDFSContentsManager.hdfs_namenode_port=9000 \
      --HDFSContentsManager.hdfs_user='myuser' \
      --HDFSContentsManager.root_dir='/user/myuser/'

Alternatively, first run:

jupyter-notebook --generate-config

to generate a default config file. Edit and add the HDFS related configurations in the generated file. Then start the notebook server.

jupyter-notebook