This is an example of using Jina's neural search framework to search through a selection of individual Wikipedia sentences downloaded from Kaggle. It's based on code generated by jina hub new --type app
.
To test this example you can run a Docker image with 30,000 pre-indexed sentences:
docker run -p 65481:65481 jinahub/app.app.jina-wikipedia-sentences-30k
You can then query by running:
curl --request POST -d '{"top_k": 10, "mode": "search", "data": ["text:hello world"]}' -H 'Content-Type: application/json' 'http://0.0.0.0:65481/api/search'`
pip install -r requirements.txt
- Set up Kaggle
sh ./get_data.sh
export JINA_DATA_PATH='data/input.txt'
python app.py index
You can set the maximum documents to index with export MAX_DOCS=500
python app.py search
Then:
curl --request POST -d '{"top_k": 10, "mode": "search", "data": ["text:hello world"]}' -H 'Content-Type: application/json' 'http://0.0.0.0:65481/api/search'
Or use Jinabox with endpoint http://127.0.0.1:65481/api/search
This will create a Docker image with pre-indexed data and an open port for REST queries.
- Run all the steps in setup and index first. Don't run anything in the query step!
- If you want to push to Jina Hub be sure to edit the
LABEL
s inDockerfile
to avoid clashing with other images - Run
docker build -t <your_image_name> .
in the root directory of this repo - Run it with
docker run -p 65481:65481 <your_image_name>
- Search using instructions from Search above
Please use the following name format for your Docker image, otherwise it will be rejected if you want to push it to Jina Hub. Please also see my versioning notes section before which explains my versioning workaround.
jinahub/type.kind.jina-image-name:image-jina_version
For example:
jinahub/app.app.jina-wikipedia-sentences-30k:0.2.3-0.9.5
Push to Jina Hub
- Ensure hub is installed with
pip install jina[hub]
- Run
jina hub login
and paste the code into your browser to authenticate - Run
jina hub push <your_image_name>
At the time of writing, jina hub new...
creates an encode.yml
with max_length: 96
. I changed this to 196
which gives more accurate results (i.e. the query word actually appears in the text of the results)
At the time of writing, the version of Jina in requirements.txt
doesn't match the jina_version
label we use in our docker build ...
command.
Why?
- I built this example with Jina 0.8.2
- Jina Hub expects you to push with the same version you built with (i.e. it would expect me to use
jina[hub]==0.8.2
to push) - Pushing with Jina Hub wouldn't work for me until 0.9.5. Luckily
jina hub push ...
only cares about the Docker image, not my actual code (I ranjina[hub]==0.9.5
in a separate virtualenv) - We're working on updating this example code to 0.9.5 to get around this ugly kluge and delete this note!