/elasticsearch-demo

Primary LanguagePythonThe UnlicenseUnlicense

Elasticsearch demo

Demo for Elasticsearch v.7.4.0 (might run on subsequent 6.x releases with some modification to the setting json file; does not work on Elasticsearch 0.x, 1.x, 2.x, and 5.x).

All documents and qrels are property of the National Institute of Standards and Technology and have been released from research purposes as part of TREC.

What you need to run this demo

  1. Python 3 (if you're still using Python 2, I strongly encourage you to switch. Check out this page for why I suggest the switch).
  2. A fairly recent JVM version. At the time of writing, "fairly recent" equals to Oracle JVM v.1.7u55 or above, or OpenJDK v.1.7.55 or above. Check this page for more infos about requirements for Elasticsearch.
  3. Elasticsearch binaries. You can get them from the official website.
  4. Optional: a C compiler. I tried to include pre-compiled versions of trec_eval in the bin folder (macOS, Debian/Ubunutu, and Windows); This program will try to use the appropriate one based on the platform your running it. If it fails, please download the source code for trec_eval from the NIST website and compile it by yourself (it should be as easy as navigating to the deflated directory and typing make). Then copy the compiled binary to the bin folder.

Usage

  1. Install requirements in requirements.txt; that is, run pip3 install -r requirements.txt.
  2. Download the data files (documents, queries, and qrels) from this page. Unzip them in the root of this project (i.e., where this file is).
  3. Start Eilasticsearch. Assuming that you have unzipped Elasticsearch to the folder where this file is located, you can execute ./elasticsearch-6.2.1/bin/elasticsearch if you are on a UNIX system, or ./elasticsearch-6.2.1/bin/elasticsearch.bat if you are on Windows. For more information on how to install and run Elasticsearch, please visit this page.
  4. Execute index.py. This will index the collection.
  5. Execute search.py. This will search the collection and evaluate the results.