Source code for mini-project. Tutorial available at: https://j.blaszyk.me/tech-blog/nrtsearch-tutorial-website-search/
Let's use nrtsearch - an open-source search engine built by Yelp - to support text search for any website.
I'm using my blog as an example dataset, but you can apply this approach to index any text content on the internet.
You need to have docker
and python3
installed on your system.
First start nrtsearch
, search-ui
, grpcox
(ui gRPC client) and search-server
. Run in a separate terminal window:
make start
It will generate nrtsearch client code, build docker images and start them using docker compose. Check docker-compose.yaml
to make sure that all exposed ports are available on your host machine.
The crawler fetches data from websites. In this tutorial I’m using my blog as an example dataset. We use beautifulsoup library for extracting website content. Run it with:
make start_crawler
In order to ingest the data into nrtsearch, we must first create an index and register the fields. Start the index with:
make start_index
We use a python script, which will batch docs and ingest them into the primary and commit the index. Run it with:
make run_indexer
On localhost:3000
you should have access to search UI. You can explore the indexed data.
On localhost:6969
there is a running instance of gRPC web client - you can interact with nrtsearch nodes.
Full tutorial here