/ESKonHPC

Repository of scripts for spinning up a personal instance of ElasticSearch and Kibana on HPC infrastructure

Primary LanguagePythonMIT LicenseMIT

ElasticSearch+Kibana on HPC Infrastructure

Repository of scripts for spinning up a personal instance of ElasticSearch and Kibana on HPC infrastructure (i.e., on Slurm)

Setup Workflow

  1. Download ElasticSearch and Kibana and decompress and create simlinks to support upgrades
    • Run bash download.sh
  2. Update run-es.sbatch to correct path
  3. Run run-es.sbatch to start ElasticSearch and Kibana.
    • See slurm.log for port forwarding
  4. Create your index with mappings for tweet metadata
    • python createIndex.py http://localhost:<ESPORT> <index_name>
  5. Push tweet data to ElasticSearch
    • python postTwitter.py http://localhost:<ESPORT> <index_name> <tweet_path.json
  6. Access Kibana and set up your index pattern
    • Create SSH tunnel to HPC, and visit http://localhost:<KPORT>, go to Management, and follow steps to point Kibana at the index index_name you created
  7. Happy searching

Setting Up Your Index

The second to last step above says to go to the Management tab and set up your index. This process tells Kibana what dataset it needs to search and how it can interpret the time data in the create_at field in Twitter data (or whatever field in your data has date information).

Screenshots of the process follow:

  1. Point your browser to Kibana. You'll first see landing page
  2. Click Management and go to Index Patterns. index pattern
  3. You'll now be asked to create an index patten for whatever indices you have in your ElasticSearch setup. Use the index name you created during setup and type it in to the text box. create index pattern
    • The pattern supports wild cards and the like, so you can have one Kibana pattern for multiple indices (useful when you break down indices by day or month).
  4. One of Kibana's real powers comes in handling time series. To enable this, you need to tell Kibana what field has your time data. If you're using Twitter data, select created_at from the dropdown box. time field

Searching in Kibana

Now that you've set up Kibana, you can search your data. Some helpful screenshots for this follow:

  1. You mainly search via the Discover section. When you first go here though, Kibana may say there's no data. This lack of data is a result of Kibana's default timeframe filter (Last 15 Minutes). You'll want to change that to cover whatever date range your date covers. search landing
  2. You can select from several pre-configured options or use your own range. I often use Last 5 Years timeframe
  3. Your data should appear now! data

Building Maps in Kibana

Kibana has a lot of visualization capability built into it. One really nice one is its mapping feature if your data already has geolocation (i.e., GPS) data available.

To build a map, follow these steps:

  1. Go to the Visualize tab, and select Create Visualization data
  2. Select the Coordinate Map option data
  3. Select your index (twitter in my case). data
  4. Set up the geohash fields. Set Aggregation to Geohash and field to coordinates.coordinates. Press the run button (the triangle) to confirm and see your viz. data