apache/accumulo

Add IT to verify Scan Servers remove their references on shutdown

ddanielr opened this issue · 3 comments

Is your feature request related to a problem? Please describe.
Noticed some behavior where scan servers shutdown via a ./accumulo-cluster stop-servers command would not clean up their entries from the metadata table.

Describe the solution you'd like
Add an IT test to verify that the scan servers delete their references on a expected shutdown pattern.

An IT already exists for this, see https://github.com/apache/accumulo/blob/2.1/test/src/main/java/org/apache/accumulo/test/ScanServerMetadataEntriesIT.java#L140

That IT only tests the scan server ref expiration removal.
Looking at the IT config, the scan refs expire after 5 seconds

cfg.setProperty(Property.SSERVER_SCAN_REFERENCE_EXPIRATION_TIME, "5s");

If that test is modified to shutdown the scan server vs wait for the refs to expire then it runs for almost 10min until the manager cleans up the scan server refs.

      // Trigger Shutdown of scan server.
      getCluster().getClusterControl().stop(ServerType.SCAN_SERVER);

      // close happens asynchronously. Let the test fail by timeout
      while (ctx.getAmple().getScanServerFileReferences().findAny().isPresent()) {
        log.info("Scan Server Refs are still present");
        log.info("Refs: {}", ctx.getAmple().getScanServerFileReferences().collect(Collectors.toList()));
        Thread.sleep(1000);
      }

That can be verified by reducing the amount of time the manager waits before cleaning up references and the test will run until the new specified interval.

ThreadPools.watchCriticalScheduledTask(context.getScheduledExecutor()
.scheduleWithFixedDelay(() -> ScanServerMetadataEntries.clean(context), 10, 10, MINUTES));

@ddanielr - you are correct, I misread the test. getCluster().getClusterControl().stop(ServerType.SCAN_SERVER) may not be stopping the ScanServer gracefully.