Yelp/nrtsearch

Some questions for clarification

frotsch opened this issue · 3 comments

After some review of the code and little experimenting with nrtsearch I came up with a few findings and questions I like to ask and clarify:

  1. Is it correct that there is no communication between replica nodes (like forwarding search requests from clients)?

  2. Is it correct that the client need to know which index is on which replica and then connect to this replica to execute a search request?

  3. It seems there is no kind of built-in loadbalancing mechanism so that client search requests are executed on the replica with lowest utilization (or in round robin fashion), true? Is loadbalancing something which should be done by the surrounding infrastructure like kubernetes? Any plans to leverage the gRPC-LB protocol (https://grpc.io/blog/grpc-load-balancing/)?

  4. Looks like HA (high availibility) is not a primary goal (which is fine) because it seems that there is no automatic fail-over (especially for primary nodes) mechanism. Is this something which should be handled by the surrounding infrastructure like kubernetes or by the client (queuing up index requests until primary is back) ?

  5. It seems possible to issue search requests against primary nodes. Is this something which is just not recommended or are there plans to forbid this technically.

  6. From what I saw in the code it appears that there is always only one shard per index (shard0). Is it possible (or planned) to have more than one shard per index. If not that would mean that all limits for lucene indices are also holding for "nrtsearch indices" and an index cannot exceed resources of a single physical machine (disk, max open files and memory).
    To let an "nrtsearch index" use resources from multiple physical machines it would be necessary to split the index into multiple shards (=multiple lucene indices) and distribute them across different machines. From my current understanding i think thats not something with nrtsearch was developed for, right? But to server as a replacement candidate for elasticsearch or solr that would be a neccessary functionality. I just like to get a better understanding what are the goals nrtsearch want to reach and what are the use cases nrtsearch is built to support.

  7. Can you elaborate briefly what "virtual sharding" is and what problem it solve? Is it related to 6)?

  8. I understand the purpose of primary and replica nodes and how they interact. But there is also a "STANDALONE" mode and I am not sure about why? Is this for testing purposes? For me it looks like a single primary node would be also sufficient given the fact that a client can issue a search request also agains a primary node, see 5)

Thanks for looking into the code and experimenting with nrtsearch. Below is my initial attempt to try to answer your questions

  1. Correct, there is no communication between replicas. They are not aware of each other. This is by design. As far as forwarding requests like createIndex, registerFields is done at Yelp, we use our container orchestration code to use the correct schemas. At node startup each cluster/service will load its index information which was previously backed up on primary via Backup command. The actual startIndex index command is currently issued by our k8s code that does pod initialization, alternatively we could do it right inside startup since we already setup IndexState object at startup time. One lacking feature at the moment is live propagation of schema changes to replicas i.e. we restart replicas to pull latest state from s3 as of today

  2. Our clusters are identified by "serviceName" and indexes by resourceName within that service. This is the namespacing we follow at Yelp to upload our indexData per cluster from primaries. Each replica is started with this "serviceName" specified in its config file, and during bootstrap as I mention in 1 above it downloads all needed Index information (resource) from there, while we lazy download the index data itself with a startIndex (restore:true) API.

  3. As mentioned in our blogpost in the Design Goals section, nrtsearch java clientlib includes client-side round-robin load-balancing (thick client approach under https://grpc.io/blog/grpc-load-balancing/). Alternatively you can rely on a service mesh like Envoy.

  4. Yes, the expectation from indexing client is that until it receives a successful ack on commit the data is not durable. Typically our indexing client pulls from kafka, sends multiple addDocument requests followed by a commit to nrtsearch and then commits back to kafka. The nrtsearch blog post talks more about this under “Implementation”.

  5. Technically, primary can also serve search requests. Since we do still have the SearchManagerobject initialized. Although we don’t use it for that purpose at Yelp. Since primary is the only node doing segment merges also sending search traffic to it is essentially turning it into a read/write node much like elasticsearch. Note: We have not really tested searching requests on Primary, so not sure of how well this works in practice. Also we use EBS for primaries so the performance isn’t the same for us.

  6. Right now we do have only 1 shard per index. IndexState was initially conceptualized to host potentially multiple Shards, however we do not do this in practice. One reason being there are multiple segments within an index/shard and we can do parallel searching on those segments within the shard as well (unlike elasticsearch which does sequential searching within one shard over multiple segments). So the shard can grow up to disk/ram limits on the machine and still have decent parallelism for searching requests since it is composed of many segments.
    Regarding horizontal scaling: Most workflows at Yelp are geo-shardable so we use application level sharding. e.g. new york index vs san francisco index. However there are some workflows which are harder to find application level sharding for and for that we use an internal "co-ordinator" microservice that does a basic "scatter-gather". Clients access nrtsearch via this co-ordinator. Currently this service is not open sourced.

  7. #297 has some context in the PR. We should probably enrichen our docs http://nrtsearch.readthedocs.io/ to have some of these concepts explained better.

  8. Standalone mode is intended for small use-cases (low indexing and search traffic) where only a single node is needed, or if anyone wants to play around with it. There are some differences in standalone and primary modes, like standalone mode won’t do any NRT.

Do go through our blog post, it includes a lot of details about our design.

Thanks @umeshdangat for good clarified answers

Adding more questions for clarification

  1. Segment Replication is a pretty good future but moving forward Segment Merge on Refresh default would help publish the segment to the replicas.
  2. How communication between primary and replicas uses gRPC.
  3. Any plans to move from gRPC gateway to JSON transcoder

@Jeevananthan-23

  1. I see it's supported starting Lucene 8.6, so we would have to wait until we update Lucene on Nrtsearch
  2. You can check the code under ReplicationServerImpl
  3. That is an Envoy filter. You should be able to use it if you're using Envoy.