Yelp/nrtsearch

Replica doesn't receive new nrt point until a document is indexed to a primary

sarthakn7 opened this issue · 4 comments

The primary seems to currently send new nrt point to replicas only after it creates new segments. This can be problematic in cases where the indexing qps is very low, and the replica may not have new segments for a long time. One solution is to send new nrt point to the replica after it is added in the primary.

The primary seems to currently send new nrt point to replicas only after it creates new segments.

  • This is not true.

New fileset is sent to replica upon each newCopyJob
which is invoked at multiple times:

  • start of a replica
  • upon getting a newNRTPoint and
  • upon launchPreCopyMerge (Primary's way of telling replica to get the new merged files so it can copy over the new segment as well).

newNRTPoint is issued in our code either via invoking WriteNRTPointHandler externally or more importantly by the PrimaryNode during refreshIfNeeded which we Override from Base Lucene class. This causes Primary to continually send newNRTPoint to Replicas.

@umeshdangat yes I was just going through the code in https://github.com/Yelp/nrtsearch/blob/master/src/main/java/com/yelp/nrtsearch/server/luceneserver/NRTPrimaryNode.java#L408 since I also thought we were already doing this. I still had the issue, and can reproduce it.

These are the steps I followed:

  1. Start primary, start index on primary with restore
  2. Start replica without restoring any state
  3. Create index on replica and start it with the primary's address

After this replica continues to have 0 segments and the size of the replica index directory stays at 20KB until a document is indexed to the primary, after which the copying starts.

I think the NRT is working as designed. Not sure this is a bug. Currently the newNRTPoint code is intended to work for the following case, when a primary has a flushAndRefresh event it will send out the newNRTPoint to all the currently connected replicas.

In the steps you mention above what happens is

  1. primary comes up restores its indexing. It does refresh its own searcher but there are not replicas connected at this point.
  2. You start a fresh JVM which is neither a primary or a replica at this point.
  3. You create index as replica on the JVM started in 2. This does get registered now with the primary. Henceforth any changes in primary's index state will be communicated to all connected replicas. But if we never index a document in primary there isnt anything for primary to refresh. Particulary the flushAndRefresh will return false.
@Override
protected IndexSearcher refreshIfNeeded(IndexSearcher referenceToRefresh) throws IOException {
  if (primary.flushAndRefresh()) {
    primary.sendNewNRTPointToReplicas();
    // NOTE: steals a ref from one ReferenceManager to another!
    return SearcherManager.getSearcher(
        searcherFactory,
        primary.mgr.acquire().getIndexReader(),
        referenceToRefresh.getIndexReader());
  } else {
    return null;
  }
}

This case seems pretty contrived to me in that, We assume we have a "static index", in that no indexing ever happens as opposed to real time indexing. Also if we do have such indexes we should simply restore state on replica as well.

Primary to replica data transfer is only intended to keep the delta between the two to a minimum in case of real time indexing. We should not expect Primary replica n/w channel as a way to dump all data from primary to replica at start up time. It will be way more expensive than downloading from s3 on bootstrap.

We have added initial nrt point sync with #326 and #327 .