- Make sure java is up to dated
- Download and unzip elasticsearch in your preferred directory
- Download and unzip kibana in your preferred directory
- Make sure you install python
- Download and install Anaoconda for jupyter notebook
- optional: use pipenv to manage python environment
To start an elasticsearch server on the local machine, cd to the unzipped elasticsearch directory and run
bin/elasticsearch
To use the kibana interface, cd to the unzipped kibana directory and run
bin/kibana
Kibana is available at localhost:5601. By default, kibana requires input data to run and has some sample data available.
Alternatively, you can index your own pdf document. To do so, install anaconda and run the following command in the poc-reghub
directory:
jupyter notebook
Jupyter notebook interface will be available at localhost:8888 by default.
Navigate to tika.ipynb
and run the file to index a pdf document pulled from the RegHub airtable database.
You can check whether you've successfully added the document to Elasticsearch via Kibana.
Kibana has a dev tool feature that lets you query your search index.
If you've successfully index a pdf document in tika.ipynb
, you can try the following queries
GET /poc-reghub/_search
{
"query": {
"match": {
"text": "another jurisdiction"
}
}
}
GET /poc-reghub/_search
{
"query": {
"match": {
"text": "Japan VASP"
}
}
}