tidwall/tile38

Tile38 unable to write and getting crash

Mukund2900 opened this issue · 4 comments

Describe the bug
Tile 38 logs ->

Jul 30 13:31:36 TILE38-APD01 tile38-server[129460]: 2024/07/30 13:31:36 [INFO] not live 10.0.11.12:57866
Jul 30 13:31:36 TILE38-APD01 tile38-server[129460]: 2024/07/30 13:31:36 [DEBU] Closed connection: 10.0.11.12:57866
Jul 30 13:31:36 TILE38-APD01 tile38-server[129460]: 2024/07/30 13:31:36 [DEBU] pubsub closed
Jul 30 13:31:36 TILE38-APD01 tile38-server[129460]: 2024/07/30 13:31:36 [DEBU] Opened connection: 10.0.11.12:40394
Jul 30 13:31:36 TILE38-APD01 tile38-server[129460]: 2024/07/30 13:31:36 [DEBU] Detached connection: 10.0.11.12:40394
Jul 30 13:31:36 TILE38-APD01 tile38-server[129460]: 2024/07/30 13:31:36 [INFO] live 10.0.11.12:40394
Jul 30 13:31:36 TILE38-APD01 tile38-server[129460]: 2024/07/30 13:31:36 [DEBU] pubsub open
Jul 30 13:31:37 TILE38-APD01 tile38-server[129460]: 2024/07/30 13:31:37 [INFO] not live 10.0.11.234:50048
Jul 30 13:31:37 TILE38-APD01 tile38-server[129460]: 2024/07/30 13:31:37 [DEBU] Closed connection: 10.0.11.234:50048
Jul 30 13:31:37 TILE38-APD01 tile38-server[129460]: 2024/07/30 13:31:37 [DEBU] pubsub closed
Jul 30 13:31:37 TILE38-APD01 tile38-server[129460]: 2024/07/30 13:31:37 [DEBU] Opened connection: 10.0.11.234:51954
Jul 30 13:31:37 TILE38-APD01 tile38-server[129460]: 2024/07/30 13:31:37 [DEBU] Detached connection: 10.0.11.234:51954
Jul 30 13:31:37 TILE38-APD01 tile38-server[129460]: 2024/07/30 13:31:37 [INFO] live 10.0.11.234:51954
Jul 30 13:31:37 TILE38-APD01 tile38-server[129460]: 2024/07/30 13:31:37 [DEBU] pubsub open
Jul 30 13:31:51 TILE38-APD01 tile38-server[129460]: 2024/07/30 13:31:51 [INFO] not live 10.0.11.12:40394
Jul 30 13:31:51 TILE38-APD01 tile38-server[129460]: 2024/07/30 13:31:51 [DEBU] Closed connection: 10.0.11.12:40394
Jul 30 13:31:51 TILE38-APD01 tile38-server[129460]: 2024/07/30 13:31:51 [DEBU] pubsub closed
Jul 30 13:31:52 TILE38-APD01 tile38-server[129460]: 2024/07/30 13:31:52 [DEBU] Opened connection: 10.0.11.12:52584
Jul 30 13:31:52 TILE38-APD01 tile38-server[129460]: 2024/07/30 13:31:52 [DEBU] Detached connection: 10.0.11.12:52584
Jul 30 13:31:52 TILE38-APD01 tile38-server[129460]: 2024/07/30 13:31:52 [INFO] live 10.0.11.12:52584
Jul 30 13:31:52 TILE38-APD01 tile38-server[129460]: 2024/07/30 13:31:52 [DEBU] pubsub open
Jul 30 13:31:52 TILE38-APD01 tile38-server[129460]: 2024/07/30 13:31:52 [DEBU] pubsub closed
Jul 30 13:31:52 TILE38-APD01 tile38-server[129460]: 2024/07/30 13:31:52 [INFO] not live 10.0.11.234:51954
Jul 30 13:31:52 TILE38-APD01 tile38-server[129460]: 2024/07/30 13:31:52 [DEBU] Closed connection: 10.0.11.234:51954
Jul 30 13:31:52 TILE38-APD01 tile38-server[129460]: 2024/07/30 13:31:52 [DEBU] Opened connection: 10.0.11.234:57034
Jul 30 13:31:52 TILE38-APD01 tile38-server[129460]: 2024/07/30 13:31:52 [DEBU] Detached connection: 10.0.11.234:57034
Jul 30 13:31:52 TILE38-APD01 tile38-server[129460]: 2024/07/30 13:31:52 [INFO] live 10.0.11.234:57034
Jul 30 13:31:52 TILE38-APD01 tile38-server[129460]: 2024/07/30 13:31:52 [DEBU] pubsub open
Jul 30 13:32:07 TILE38-APD01 tile38-server[129460]: 2024/07/30 13:32:07 [INFO] not live 10.0.11.12:52584
Jul 30 13:32:07 TILE38-APD01 tile38-server[129460]: 2024/07/30 13:32:07 [DEBU] Closed connection: 10.0.11.12:52584
Jul 30 13:32:07 TILE38-APD01 tile38-server[129460]: 2024/07/30 13:32:07 [DEBU] pubsub closed
Jul 30 13:32:07 TILE38-APD01 tile38-server[129460]: 2024/07/30 13:32:07 [DEBU] Opened connection: 10.0.11.12:38626
Jul 30 13:32:07 TILE38-APD01 tile38-server[129460]: 2024/07/30 13:32:07 [DEBU] Detached connection: 10.0.11.12:38626
Jul 30 13:32:07 TILE38-APD01 tile38-server[129460]: 2024/07/30 13:32:07 [INFO] live 10.0.11.12:38626
Jul 30 13:32:07 TILE38-APD01 tile38-server[129460]: 2024/07/30 13:32:07 [DEBU] pubsub open
Jul 30 13:32:07 TILE38-APD01 tile38-server[129460]: 2024/07/30 13:32:07 [INFO] not live 10.0.11.234:57034
Jul 30 13:32:07 TILE38-APD01 tile38-server[129460]: 2024/07/30 13:32:07 [DEBU] Closed connection: 10.0.11.234:57034
Jul 30 13:32:07 TILE38-APD01 tile38-server[129460]: 2024/07/30 13:32:07 [DEBU] pubsub closed
Jul 30 13:32:07 TILE38-APD01 tile38-server[129460]: 2024/07/30 13:32:07 [DEBU] Opened connection: 10.0.11.234:58684
Jul 30 13:32:07 TILE38-APD01 tile38-server[129460]: 2024/07/30 13:32:07 [DEBU] Detached connection: 10.0.11.234:58684

To Reproduce
High writes for locations to database

Expected behavior
I expected it to handle all these writes and not give this following error in tile38 client ->


2024-07-30 13:31:11 ERROR [Thread-2748301] LocationUpdateService:139 - Error processing socket data due to -> java.lang.RuntimeException: java.util.concurrent.TimeoutException
2024-07-30 13:31:12 ERROR [Thread-2748301] LocationUpdateService:140 - java.lang.RuntimeException: java.util.concurrent.TimeoutException
	at com.moveinsync.tile38.api.Tile38TemplateImpl.executeBatchedCommands(Tile38TemplateImpl.java:489)
	at com.moveinsync.geofencing.service.LocationUpdateService.handleSocketClusterLocationUpdate(LocationUpdateService.java:115)
	at com.moveinsync.geofencing.service.LocationUpdateService$$FastClassBySpringCGLIB$$49cbc07d.invoke(<generated>)
	at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:218)
	at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:689)
	at com.moveinsync.geofencing.service.LocationUpdateService$$EnhancerBySpringCGLIB$$121cad48.handleSocketClusterLocationUpdate(<generated>)
	at com.moveinsync.geofencing.config.SocketClusterConfig$3.lambda$call$0(SocketClusterConfig.java:131)
	at java.base/java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1736)
	at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.util.concurrent.TimeoutException
	at java.base/java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1892)
	at java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2027)
	at com.moveinsync.tile38.api.Tile38TemplateImpl.executeBatchedCommands(Tile38TemplateImpl.java:483)
	... 8 more

Code in client ->

    @Override
    public List<List<Object>> executeBatchedCommands(@NotNull List<CommandArgs<String, String>> commandArgsList, BatchedCommandType commandType) {
        List<List<Object>> results = new ArrayList<>();
        if (asyncCommands == null) {
            asyncCommands = client.connect().async();
        }
        asyncCommands.setAutoFlushCommands(false);
        List<RedisFuture<List<Object>>> futureResults = new ArrayList<>();
        for (CommandArgs<String, String> commandArgs : commandArgsList) {
            RedisFuture<List<Object>> future = asyncCommands.dispatch(
                    commandType,
                    new ArrayOutput<>(StringCodec.UTF8),
                    commandArgs);
            futureResults.add(future);
        }
        asyncCommands.flushCommands();
        for (RedisFuture<List<Object>> future : futureResults) {
            try {
                results.add(future.get(timeOutDuration, TimeUnit.SECONDS));
            }
            catch (TimeoutException e) {
                logger.info("TimeoutException : Commands executed {} and commandType {} ", commandArgsList, commandType);
                logger.error("TimeoutException", e);
                future.cancel(true);
                throw new RuntimeException(e);
            } catch (Exception e) {
                logger.info("Exception : Commands executed {} and commandType {} ", commandArgsList, commandType);
                logger.error("Exception", e);
                e.printStackTrace();
            }
        }
        asyncCommands.setAutoFlushCommands(true);
        return results;
    }

Operating System (please complete the following information):

  • OS: Linux - ec2 instance
  • Version: latest

Additional context
What should be my next step to fix this.
On restart it starting working as expected again as I got these logs ->

Jul 30 13:36:23 TILE38-APD01 tile38-server[129460]: 2024/07/30 13:36:23 [DEBU] Closed connection: 10.0.11.208:44146
Jul 30 13:36:23 TILE38-APD01 tile38-server[129460]: 2024/07/30 13:36:23 [DEBU] Closed connection: 10.0.11.208:47416
Jul 30 13:36:23 TILE38-APD01 tile38-server[129460]: 2024/07/30 13:36:23 [DEBU] Closed connection: 10.0.11.208:47494
Jul 30 13:36:23 TILE38-APD01 tile38-server[129460]: 2024/07/30 13:36:23 [DEBU] Closed connection: 10.0.11.208:37856
Jul 30 13:36:23 TILE38-APD01 tile38-server[129460]: 2024/07/30 13:36:23 [DEBU] Closed connection: 10.0.11.208:47418
Jul 30 13:36:23 TILE38-APD01 tile38-server[129460]: 2024/07/30 13:36:23 [DEBU] Closed connection: 10.0.11.208:37858
Jul 30 13:36:23 TILE38-APD01 tile38-server[129460]: 2024/07/30 13:36:23 [DEBU] Closed connection: 10.0.11.208:47420
Jul 30 13:36:23 TILE38-APD01 tile38-server[129460]: 2024/07/30 13:36:23 [DEBU] Closed connection: 10.0.11.208:37860
Jul 30 13:36:23 TILE38-APD01 tile38-server[129460]: 2024/07/30 13:36:23 [DEBU] Closed connection: 10.0.11.208:47422
Jul 30 13:36:23 TILE38-APD01 tile38-server[129460]: 2024/07/30 13:36:23 [DEBU] Closed connection: 10.0.11.208:37862
Jul 30 13:36:23 TILE38-APD01 tile38-server[129460]: 2024/07/30 13:36:23 [DEBU] Closed connection: 10.0.11.208:47424
Jul 30 13:36:23 TILE38-APD01 tile38-server[129460]: 2024/07/30 13:36:23 [DEBU] Closed connection: 10.0.11.208:37864
Jul 30 13:36:23 TILE38-APD01 tile38-server[129460]: 2024/07/30 13:36:23 [DEBU] Closed connection: 10.0.11.208:39470
Jul 30 13:36:23 TILE38-APD01 tile38-server[129460]: 2024/07/30 13:36:23 [DEBU] Closed connection: 10.0.11.208:47426
Jul 30 13:36:23 TILE38-APD01 tile38-server[129460]: 2024/07/30 13:36:23 [DEBU] Closed connection: 10.0.11.208:39472
Jul 30 13:36:23 TILE38-APD01 tile38-server[129460]: 2024/07/30 13:36:23 [DEBU] Closed connection: 10.0.11.208:47428
Jul 30 13:36:23 TILE38-APD01 tile38-server[129460]: 2024/07/30 13:36:23 [DEBU] Closed connection: 10.0.11.208:47430
Jul 30 13:36:23 TILE38-APD01 tile38-server[129460]: 2024/07/30 13:36:23 [DEBU] Closed connection: 10.0.11.208:47432
Jul 30 13:36:23 TILE38-APD01 tile38-server[129460]: 2024/07/30 13:36:23 [DEBU] Closed connection: 10.0.11.208:39474
Jul 30 13:36:23 TILE38-APD01 tile38-server[129460]: 2024/07/30 13:36:23 [DEBU] Closed connection: 10.0.11.208:39476
Jul 30 13:36:23 TILE38-APD01 tile38-server[129460]: 2024/07/30 13:36:23 [DEBU] Closed connection: 10.0.11.208:39478
Jul 30 13:36:23 TILE38-APD01 tile38-server[129460]: 2024/07/30 13:36:23 [DEBU] Closed connection: 10.0.11.208:47434
Jul 30 13:36:23 TILE38-APD01 tile38-server[129460]: 2024/07/30 13:36:23 [DEBU] Closed connection: 10.0.11.208:47436
Jul 30 13:36:23 TILE38-APD01 tile38-server[129460]: 2024/07/30 13:36:23 [DEBU] Closed connection: 10.0.11.208:47438
Jul 30 13:36:23 TILE38-APD01 tile38-server[129460]: 2024/07/30 13:36:23 [DEBU] Closed connection: 10.0.11.208:39480
Jul 30 13:36:23 TILE38-APD01 tile38-server[129460]: 2024/07/30 13:36:23 [DEBU] Closed connection: 10.0.11.208:39482
Jul 30 13:36:23 TILE38-APD01 tile38-server[129460]: 2024/07/30 13:36:23 [DEBU] Closed connection: 10.0.11.208:47440
Jul 30 13:36:23 TILE38-APD01 tile38-server[129460]: 2024/07/30 13:36:23 [DEBU] Closed connection: 10.0.11.208:39484
Jul 30 13:36:23 TILE38-APD01 tile38-server[129460]: 2024/07/30 13:36:23 [DEBU] Closed connection: 10.0.11.208:47442
Jul 30 13:36:23 TILE38-APD01 tile38-server[129460]: 2024/07/30 13:36:23 [DEBU] Closed connection: 10.0.11.208:39486
Jul 30 13:36:23 TILE38-APD01 tile38-server[129460]: 2024/07/30 13:36:23 [DEBU] Closed connection: 10.0.11.208:47444
Jul 30 13:36:23 TILE38-APD01 tile38-server[129460]: 2024/07/30 13:36:23 [DEBU] Closed connection: 10.0.11.208:47446
Jul 30 13:36:23 TILE38-APD01 tile38-server[129460]: 2024/07/30 13:36:23 [DEBU] Closed connection: 10.0.11.208:39488
Jul 30 13:36:23 TILE38-APD01 tile38-server[129460]: 2024/07/30 13:36:23 [DEBU] Closed connection: 10.0.11.208:39490
Jul 30 13:36:23 TILE38-APD01 tile38-server[129460]: 2024/07/30 13:36:23 [DEBU] Closed connection: 10.0.11.208:47448
Jul 30 13:36:23 TILE38-APD01 tile38-server[129460]: 2024/07/30 13:36:23 [DEBU] Closed connection: 10.0.11.208:39492
Jul 30 13:36:23 TILE38-APD01 tile38-server[129460]: 2024/07/30 13:36:23 [DEBU] Closed connection: 10.0.11.208:47450
Jul 30 13:36:23 TILE38-APD01 tile38-server[129460]: 2024/07/30 13:36:23 [DEBU] Closed connection: 10.0.11.208:39494
Jul 30 13:36:23 TILE38-APD01 tile38-server[129460]: 2024/07/30 13:36:23 [DEBU] Closed connection: 10.0.11.208:47452
Jul 30 13:36:23 TILE38-APD01 tile38-server[129460]: 2024/07/30 13:36:23 [DEBU] Closed connection: 10.0.11.208:39496
Jul 30 13:36:23 TILE38-APD01 tile38-server[129460]: 2024/07/30 13:36:23 [DEBU] Closed connection: 10.0.11.208:47454
Jul 30 13:36:23 TILE38-APD01 tile38-server[129460]: 2024/07/30 13:36:23 [DEBU] Closed connection: 10.0.11.208:39498
Jul 30 13:36:23 TILE38-APD01 tile38-server[129460]: 2024/07/30 13:36:23 [DEBU] Closed connection: 10.0.11.208:39500
Jul 30 13:36:23 TILE38-APD01 tile38-server[129460]: 2024/07/30 13:36:23 [DEBU] Closed connection: 10.0.11.208:39502
Jul 30 13:36:23 TILE38-APD01 tile38-server[129460]: 2024/07/30 13:36:23 [DEBU] Closed connection: 10.0.11.208:47472
Jul 30 13:36:23 TILE38-APD01 tile38-server[129460]: 2024/07/30 13:36:23 [DEBU] Closed connection: 10.0.11.208:47474
Jul 30 13:36:23 TILE38-APD01 tile38-server[129460]: 2024/07/30 13:36:23 [DEBU] Closed connection: 10.0.11.208:47456
Jul 30 13:36:23 TILE38-APD01 tile38-server[129460]: 2024/07/30 13:36:23 [DEBU] Closed connection: 10.0.11.208:47458
Jul 30 13:36:23 TILE38-APD01 tile38-server[129460]: 2024/07/30 13:36:23 [DEBU] Closed connection: 10.0.11.208:47464

Looks like the connection is not closing ?

What do you mean "high writes for location"? Do you happen to be writing lots of points to the same coordinates?

Sometimes yes, but number of location write to tile38 is high (writing lots of points) to db.
Almost 1500-2000 points written to tile38 every second.

How & why will that cause an issue ?

Its been noted before that if you put tons of points at the same coordinates (ex, 0,0) it causes an imbalance in the internal indexing in tile38 that causes performance to degrade dramatically and eventually crash.

Solution is to just not use placeholder coordinates in most cases.

Okay but I am a bit confused,
In my case it happens to be for same entity like
SET fleet truck1 POINT 33.5123 -112.2693 gets executed 30 times in the same time interval.
And for different entities like SET fleet truck1 POINT 30.5123 -110.2693 this gets executed 20 times in that second only.

Am I bound to get this issue even if this is done for different entities, which each has its own coordinate but that gets written multiple times