dotnet/orleans

Why might setting the EntryExpiry attribute lead to duplicate activations in the cluster?

ximi522 opened this issue · 7 comments

I noticed the comment on the EntryExpiry attribute of RedisGrainDirectoryOptions suggests that "Setting a value different from null will cause duplicate activations in the cluster." Could you please explain under what circumstances this would happen?

Because if the entry is removed while the grain is active, then another silo may come and try to activate that same grain again. Since there is no existing entry in the directory (due to TTL expiry), it is free to do so.

I am using Orleans under a k8s cluster, where the GrainDirectory is RedisGrainDirectory, and the implementation of IPlacementDirector selects the surviving silo in a Hash manner. When I restart one of the silos, the Grains on this silo correctly complete Deactivate and migrate to another silo. However, after the restart is complete, if there is access to those migrated Grains, some will try to activate on the restarted silo and return a Warning, even though these Grains have already been activated on other silos. Moreover, these warnings still occur after these Grains are Deactivated and then Activated again. I am not sure if this is because I set EntryExpiry = TimeSpan.FromDays(2).
Here are the logs of the warnings.

09-07 20:06:13.010
TAG.pod_name:silo-planetserver-1__TAG__.pod_ip:172.21.21.65@m:Failed to register grain "[Activation: S172.21.21.65:11111:53077893/mahjongroom/6496b45fc610cd229dcfaf38@98fb33a58942483fa6279436c1647777#Placement=ArchiveDbHashBasedPlacement State=Invalid]" in grain directory@t:2023-09-07T12:06:12.6372506Z

09-07 20:06:15.007
TAG.pod_name:silo-planetserver-1__TAG__.pod_ip:172.21.21.65@m:Failed to register grain "[Activation: S172.21.21.65:11111:53077893/mahjongroom/6496b45fc610cd229dcfaf38@8571d8b428f341188d41ff12ca37ba60#Placement=ArchiveDbHashBasedPlacement State=Invalid]" in grain directory@t:2023-09-07T12:06:14.6398548Z

09-08 14:00:14.011
2023-09-08T06:00:13.9136491Z
TAG.pod_name:silo-planetserver-1__TAG__.pod_ip:172.21.21.65@m:Failed to register grain "[Activation: S172.21.21.65:11111:53077893/mahjongroom/6496b45fc610cd229dcfaf38@7e0f0b3b762f491c8a0afa45fbbebaca#Placement=ArchiveDbHashBasedPlacement State=Invalid]" in grain

  • directory@t:2023-09-08T06:00:13.9136491Z

09-08 14:03:57.008
2023-09-08T06:03:56.2201857Z
TAG.pod_name:silo-planetserver-1__TAG__.pod_ip:172.21.21.65@m:Failed to register grain "[Activation: S172.21.21.65:11111:53077893/mahjongroom/6496b45fc610cd229dcfaf38@ec921c629b3f4756b0448eb925730f79#Placement=ArchiveDbHashBasedPlacement State=Invalid]" in grain directory@t:2023-09-08T06:03:56.2201857Z

And Here are the logs of the grain.

09-07 15:51:50.010
TAG.pod_name:silo-planetserver-1__TAG__.pod_ip:172.21.21.65id:6496b45fc610cd229dcfaf38@m:mahjongRoomGrain Activate, (roomId: "6496b45fc610cd229dcfaf38", state:Dissmiss)@t:2023-09-07T07:51:49.5599188Z
09-07 16:13:36.012
TAG.pod_name:silo-planetserver-1__TAG__.pod_ip:172.21.21.65id:6496b45fc610cd229dcfaf38@m:mahjongRoomGrain Deactivate, (roomId: "6496b45fc610cd229dcfaf38", state:Dissmiss)@t:2023-09-07T08:13:35.5903591Z

09-07 20:06:11.008
TAG.pod_name:silo-planetserver-2__TAG__.pod_ip:172.21.21.49id:6496b45fc610cd229dcfaf38@m:mahjongRoomGrain Activate, (roomId: "6496b45fc610cd229dcfaf38", state:Dissmiss)@t:2023-09-07T12:06:10.6372987Z
09-07 21:01:32.011
TAG.pod_name:silo-planetserver-2__TAG__.pod_ip:172.21.21.49id:6496b45fc610cd229dcfaf38@m:mahjongRoomGrain Deactivate, (roomId: "6496b45fc610cd229dcfaf38", state:Dissmiss)@t:2023-09-07T13:01:31.3187837Z

09-07 22:48:13.011
TAG.pod_name:silo-planetserver-2__TAG__.pod_ip:172.21.21.49id:6496b45fc610cd229dcfaf38@m:mahjongRoomGrain Activate, (roomId: "6496b45fc610cd229dcfaf38", state:Dissmiss)@t:2023-09-07T14:48:12.0882594Z
09-07 23:10:32.010
TAG.pod_name:silo-planetserver-2__TAG__.pod_ip:172.21.21.49id:6496b45fc610cd229dcfaf38@m:mahjongRoomGrain Deactivate, (roomId: "6496b45fc610cd229dcfaf38", state:Dissmiss)@t:2023-09-07T15:10:31.2649062Z

09-08 09:36:07.013
TAG.pod_name:silo-planetserver-2__TAG__.pod_ip:172.21.21.49id:6496b45fc610cd229dcfaf38@m:mahjongRoomGrain Activate, (roomId: "6496b45fc610cd229dcfaf38", state:Dissmiss)@t:2023-09-08T01:36:06.9451129Z
09-08 09:58:31.012
TAG.pod_name:silo-planetserver-2__TAG__.pod_ip:172.21.21.49id:6496b45fc610cd229dcfaf38@m:mahjongRoomGrain Deactivate, (roomId: "6496b45fc610cd229dcfaf38", state:Dissmiss)@t:2023-09-08T01:58:31.0058551Z

Thanks for contacting us. We believe that the question you've raised has been answered. If you still feel a need to continue the discussion, feel free to reopen the issue and add your comments.

What are you using for cluster membership and are all of your silos a part of the same cluster?

Thanks for contacting us. We believe that the question you've raised has been answered. If you still feel a need to continue the discussion, feel free to reopen the issue and add your comments.

What are you using for cluster membership and are all of your silos a part of the same cluster?

Yes, all of silos are in the same cluster.Here is the cluster membership options set.

 siloBuilder.Configure<ClusterMembershipOptions>(options =>
 {
         options.TableRefreshTimeout = TimeSpan.FromSeconds(60);
         options.DefunctSiloExpiration = TimeSpan.FromMinutes(5); 
         options.DefunctSiloCleanupPeriod = TimeSpan.FromMinutes(5);
         options.IAmAliveTablePublishTimeout = TimeSpan.FromSeconds(30);
         options.NumMissedProbesLimit = 3;
         options.NumVotesForDeathDeclaration = 2;
         options.LocalHealthDegradationMonitoringPeriod = TimeSpan.FromSeconds(30);
         options.EnableIndirectProbes = true;
 });

we used redis cluster as the clustering provider.Here is the redis options set

siloBuilder.AddRedisGrainDirectory(Consts.RedisGrainDirectory, options =>
  {
      options.ConfigurationOptions = ConfigurationOptions.Parse(redisConnectionString);
      options.EntryExpiry = TimeSpan.FromDays(2);
  }
);
siloBuilder.UseRedisClustering(redisConnectionString, 
    (int)ConfigurationOptions.Parse(redisConnectionString).DefaultDatabase);
siloBuilder.AddRedisGrainStorage(GrainStorageName.Redis,
        options => options.ConnectionString = redisConnectionString);