- Introduction
- What You’ll Need
- Install Docker
- Install Docker Compose
- Get the Imhotep Docker Images
- Run Docker Compose
- Use the Tools
- Appendix A: Architecture
- Appendix B: Container Troubleshooting
Created by gh-md-toc
Imhotep is a large-scale analytics platform built by Indeed. To learn more, look at the Imhotep documentation.
If you want to quickly evaluate Imhotep, you can install all the components on a single machine using docker. The Architecture section below describes the components in more detail.
- the ability to install Docker if you don’t already have it
- about 10 GB of free disk space
If you are running a linux distribution, you can probably install using the get.docker.com script.
curl -sSL https://get.docker.com/ | sh
Download and install Docker for Mac.
Download and install Docker for Windows.
Follow the Install Docker Compose instructions to install Docker Compose for your platform.
This option allows you to download pre-built images, which may save you some time building.
Download the docker-compose.yml file into a new directory:
mkdir imhotep-docker
cd imhotep-docker
wget https://raw.githubusercontent.com/indeedeng/imhotep-docker/master/docker-compose.yml
docker-compose pull
Clone or download the imhotep-docker project.
Option 1. Clone with SSH:
git clone git@github.com:indeedeng/imhotep-docker.git
Option 2. Clone with HTTPS:
git clone https://github.com/indeedeng/imhotep-docker.git
Option 3. Download and expand zip archive:
wget https://github.com/indeedeng/imhotep-docker/archive/master.zip
unzip master.zip
Before building the docker images, you may want to consider using the --squash option to save disk space. If your version of Docker has --squash support (experimental in 1.13), you can set this environment variable to enable the option:
export DOCKER_BUILD_OPTS=--squash
Until we confirm that Imhotep works with Java 8, you'll also need to download a JDK 7 RPM (jdk-7u80-linux-x64.rpm) from Oracle and copy it into the base-java7/ directory, e.g.
cp ~/Downloads/jdk-7u80-linux-x64.rpm imhotep-docker/base-java7/
Run the provided bash script to build and install the Imhotep docker images locally.
cd imhotep-docker
./build-images.sh
This script will run for a while, and when it is complete, you will have four imhotep images available.
- local/imhotep-frontend:centos6
- local/imhotep-daemon:centos6
- local/imhotep-cdh5-hdfs:centos6
- local/imhotep-zookeeper:centos6
If you would like to run the Imhotep web tools on a port other than 80, you can either set an environment variable or edit the .env file. For example, to run on 8080 from a bash shell:
export IQL_PORT=8080
Now you are ready to run the four docker containers that make up the full Imhotep stack.
docker-compose up
You will see a lot of log messages while the stack starts up.
The first time you run docker-compose, it will create a docker volume for the HDFS storage. That way, if you restart your containers, your data will still be available. Usually the last messages you see on first run look like this:
hadoop_1 | Started Hadoop secondarynamenode:[ OK ]
hadoop_1 | Setting up typical users
hadoop_1 | Creating /imhotep/ in HDFS
hadoop_1 | Refresh user to groups mapping successful
hadoop_1 | /
hadoop_1 | /imhotep
hadoop_1 | /imhotep/imhotep-build
hadoop_1 | /imhotep/imhotep-build/iupload
hadoop_1 | /imhotep/imhotep-build/iupload/failed
hadoop_1 | /imhotep/imhotep-build/iupload/indexedtsv
hadoop_1 | /imhotep/imhotep-build/iupload/tsvtoindex
hadoop_1 | /imhotep/imhotep-data
hadoop_1 | /imhotep/iql
hadoop_1 | /imhotep/iql/shortlinks
hadoop_1 | /user
hadoop_1 | /user/root
hadoop_1 | /user/shardbuilder
hadoop_1 | /user/tomcat7
These messages indicate HDFS is ready to be used by the Imhotep components.
Due to a quirk of relative startup times, after first-time startup you'll need to restart the Tomcat in the frontend container in order for the short-link feature to work:
frontend_id=`docker ps | grep imhotep-frontend | cut -f1 -d\ `
docker exec -i $frontend_id service tomcat stop
docker exec -i $frontend_id service tomcat start
You should now be able to access the web tools for Imhotep:
- IUpload: http://localhost/iupload/
- IQL: http://localhost/iql/
(Be sure to specify the correct port if you changed the default.)
Now you are ready to upload TSV time-series data (using IUpload, start here) and run queries on your data sets (using IQL, start here).
The ImhotepDaemon is the back-end component responsible for looking servicing query requests. Adding instances of ImhotepDaemon is the primary way to maintain high performance with large amounts of data and increased load.
This component is implemented in Java and depends on the zookeeper cluster (to coordinate with other components) and the storage layer (HDFS or S3, to pull down data shards for serving).
The IQL webapp presents a web-based user interface for issuing IQL queries. Usage of this tool is described in the Quick Start guide.
This component is implemented in Java and typically runs in the Tomcat7 servlet container behind the Apache web server. It depends on the zookeeper cluster (to find ImhotepDaemon instances) and ImhotepDaemon instances (to service queries).
The IUpload webapp presents a web-based user interface for uploading data in TSV or CSV format into the Imhotep system. Usage of this tool is described in the Quick Start guide.
This component is implemented in Java and typically runs in the Tomcat7 servlet container behidn the Apache web server. It depends on the storage layer (HDFS or S3) to place uploaded files. It is optional; TSV/CSV data can be placed directly in the storage layer following conventions described in the Quick start guide.
The shard builder typically runs as a scheduled cron job and handles converting TSV or CSV files that have been uploaded to the storage layer into data shards for consumption by the ImhotepDaemon instances.
This component is implemented in Java and depends on the storage layer (HDFS or S3, to retrieve uploaded data and store converted data).
The storage layer for Imhotep can be HDFS (Apache Hadoop File System) or S3 (Amazon Simple Storage Service). S3 is probably preferable if you are running in AWS. If not running in AWS, you should probably choose HDFS, as we do for this docker evaluation version of the stack.
Imhotep has been tested with the CDH5 distribution of Hadoop.
The zookeeper cluster is used for coordination among the ImhotepDaemon instances and the IQL webapp frontend.
Imhotep has been tested with Zookeeper 3.4.5 from the CDH 5 distribution (download link).
You can run docker ps
to see your running docker containers. You can access the containers by interactively running bash in them (docker exec -it <ID> bash
). Here’s an example of connecting to the imhotep-frontend container:
$ docker ps
c91cbbc7722a local/imhotep-cdh5-hdfs:centos6 "/bin/sh -c hdfs-s..." 23 seconds ago Up 3 seconds 8020/tcp imhotepimages_hadoop_1
d9e4c6eabc3f local/imhotep-zookeeper:centos6 "/opt/zookeeper/bi..." 23 seconds ago Up 3 seconds 2181/tcp imhotepimages_zookeeper_1
58a278dcc7a2 local/imhotep-daemon:centos6 "/bin/sh -c /opt/i..." 23 seconds ago Up 4 seconds 12345/tcp imhotepimages_daemon_1
4fdb16fef6ae local/imhotep-frontend:centos6 "/bin/sh -c ./star..." 23 seconds ago Up 8 seconds 0.0.0.0:80->80/tcp imhotepimages_frontend_1
$ docker exec -it 4fdb16fef6ae bash
[root@4fdb16fef6ae imhotepTsvConverter]# ls /opt/imhotepTsvConverter/logs/
shardBuilder-error.log shardBuilder.log
[root@4fdb16fef6ae imhotepTsvConverter]# ls /opt/tomcat7/logs/
catalina.2017-02-27.log iql-error.log localhost_access_log.2017-02-27.txt
catalina.out iql.log manager.2017-02-27.log
host-manager.2017-02-27.log localhost.2017-02-27.log
This container runs IUpload, IQL, and the Shard Builder (TSV converter).
- IQL and IUpload webapp WAR deployment files: /opt/tomcat7/webapps/
- Web application log files: /opt/tomcat7/logs/
- Shard builder cron script: /opt/imhotepTsvConverter/tsvConverter.sh
- Shard builder logs: /opt/imhotepTsvConverter/logs/
- HDFS configuration for Tomcat: /opt/tomcat_shared/core-site.xml
- HDFS configuration for shard builder: /opt/imhotepTsvConverter/conf/core-site.xml
This container runs the Imhotep server process.
- Script that runs the process: /opt/imhotep/imhotep.sh
- Daemon log file: /var/data/imhotep/logs/ImhotepDaemon_log4j.log
- HDFS configuration: /opt/imhotep/core-site.xml
- Various data files: /var/data/
This container runs HDFS in a single server mode. You can connect to this container and run hdfs
commands to interact with the files there. Example:
$ docker exec -it c91cbbc7722a bash
[root@c91cbbc7722a /]# hdfs dfs -find /
/
/imhotep
/imhotep/imhotep-build
/imhotep/imhotep-build/iupload
/imhotep/imhotep-build/iupload/failed
/imhotep/imhotep-build/iupload/indexedtsv
/imhotep/imhotep-build/iupload/tsvtoindex
/imhotep/imhotep-data
/imhotep/iql
/imhotep/iql/shortlinks
/user
/user/root
/user/shardbuilder
/user/tomcat7
This container runs a single zookeeper node. You probably won’t need to connect to it.
This project is governed by the Contributor Covenant v 1.4.1