tarantool/tarantool-operator

Join error

chelsEg opened this issue · 3 comments

Hello.

Sometime in operator log i see errors like this:

{
   "level":"error",
   "ts":1612859086.4840052,
   "logger":"controller_cluster",
   "msg":"Join error",
   "Request.Namespace":"dmp-base-stage",
   "Request.Name":"dmp-cluster",
   "error":"Post http://x.x.x.x:8081/admin/api: dial tcp x.x.x.x:8081: connect: connection refused",
   "stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/go/pkg/mod/github.com/go-logr/zapr@v0.1.1/zapr.go:128\ngithub.com/tarantool/tarantool-operator/pkg/controller/cluster.(*ReconcileCluster).Reconcile\n\t/app/pkg/controller/cluster/cluster_controller.go:328\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.1.10/pkg/internal/controller/controller.go:215\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.1.10/pkg/internal/controller/controller.go:158\nk8s.io/apimachinery/pkg/util/wait.JitterUntil.func1\n\t/go/pkg/mod/k8s.io/apimachinery@v0.0.0-20181127025237-2b1284ed4c93/pkg/util/wait/wait.go:133\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/go/pkg/mod/k8s.io/apimachinery@v0.0.0-20181127025237-2b1284ed4c93/pkg/util/wait/wait.go:134\nk8s.io/apimachinery/pkg/util/wait.Until\n\t/go/pkg/mod/k8s.io/apimachinery@v0.0.0-20181127025237-2b1284ed4c93/pkg/util/wait/wait.go:88"
}

My RoleConfig:

RoleConfig:
  - RoleName: storage
    ReplicaCount: 2
    ReplicaSetCount: 1
    DiskSize: 5Gi
    CPUallocation: 0.25
    MemtxMemoryMB: 1024
    RolesToAssign:
      - vshard-router
      - vshard-storage
      - crud-router
      - crud-storage
      - metrics
      - migrator

Operator version: 0.0.8
Tarantool version in docker image: 2.6.2

Hello!
Please, show the pod logs with the new tarantool instance.

Logs from storage-0-0 on last hour (it's leader):

2021-02-09 10:02:55.455 [1] main txn.c:588 W> 3 messages suppressed
2021-02-09 10:02:55.455 [1] main txn.c:588 W> too long WAL write: 1 rows at LSN 459499: 0.714 sec
2021-02-09 10:02:55.455 [1] main txn.c:588 W> too long WAL write: 1 rows at LSN 459500: 0.610 sec
2021-02-09 10:18:09.606 [1] main txn.c:588 W> too long WAL write: 1 rows at LSN 465729: 0.991 sec
2021-02-09 10:18:09.606 [1] main txn.c:588 W> too long WAL write: 1 rows at LSN 465730: 0.987 sec
2021-02-09 10:18:09.606 [1] main txn.c:588 W> too long WAL write: 1 rows at LSN 465731: 0.914 sec
2021-02-09 10:18:09.606 [1] main txn.c:588 W> too long WAL write: 1 rows at LSN 465732: 0.836 sec
2021-02-09 10:18:09.606 [1] main txn.c:588 W> too long WAL write: 1 rows at LSN 465733: 0.835 sec
2021-02-09 10:18:09.606 [1] main txn.c:588 W> too long WAL write: 1 rows at LSN 465734: 0.791 sec
2021-02-09 10:18:09.606 [1] main txn.c:588 W> too long WAL write: 1 rows at LSN 465735: 0.789 sec
2021-02-09 10:18:09.606 [1] main txn.c:588 W> too long WAL write: 1 rows at LSN 465736: 0.787 sec
2021-02-09 10:18:09.606 [1] main txn.c:588 W> too long WAL write: 1 rows at LSN 465737: 0.744 sec
2021-02-09 10:18:09.606 [1] main txn.c:588 W> too long WAL write: 1 rows at LSN 465738: 0.691 sec
2021-02-09 10:18:50.058 [1] main txn.c:588 W> 2 messages suppressed
2021-02-09 10:18:50.058 [1] main txn.c:588 W> too long WAL write: 1 rows at LSN 467391: 1.036 sec
2021-02-09 10:18:50.058 [1] main txn.c:588 W> too long WAL write: 1 rows at LSN 467392: 0.641 sec
2021-02-09 10:18:50.058 [1] main txn.c:588 W> too long WAL write: 1 rows at LSN 467393: 0.589 sec

Logs from storage-0-1 on last hour:

2021-02-09 10:03:15.593 [1] main txn.c:588 W> 4 messages suppressed
2021-02-09 10:03:15.593 [1] main txn.c:588 W> too long WAL write: 1 rows at LSN 460783: 0.767 sec
2021-02-09 10:03:15.593 [1] main txn.c:588 W> too long WAL write: 1 rows at LSN 460785: 0.731 sec
2021-02-09 10:03:15.593 [1] main txn.c:588 W> too long WAL write: 1 rows at LSN 460787: 0.645 sec
2021-02-09 10:03:15.593 [1] main txn.c:588 W> too long WAL write: 1 rows at LSN 460789: 0.601 sec
2021-02-09 10:03:15.593 [1] main txn.c:588 W> too long WAL write: 1 rows at LSN 460791: 0.557 sec
2021-02-09 10:03:15.593 [1] main txn.c:588 W> too long WAL write: 1 rows at LSN 460793: 0.557 sec
2021-02-09 10:03:15.593 [1] main txn.c:588 W> too long WAL write: 1 rows at LSN 460795: 0.525 sec
2021-02-09 10:03:15.593 [1] main txn.c:588 W> too long WAL write: 1 rows at LSN 460797: 0.513 sec
2021-02-09 10:03:15.593 [1] main txn.c:588 W> too long WAL write: 1 rows at LSN 460799: 0.513 sec
2021-02-09 10:08:18.747 [1] main txn.c:588 W> too long WAL write: 1 rows at LSN 462781: 0.776 sec
2021-02-09 10:08:18.747 [1] main txn.c:588 W> too long WAL write: 1 rows at LSN 462783: 0.776 sec
2021-02-09 10:08:18.747 [1] main txn.c:588 W> too long WAL write: 1 rows at LSN 462785: 0.722 sec
2021-02-09 10:08:18.747 [1] main txn.c:588 W> too long WAL write: 1 rows at LSN 462787: 0.624 sec
2021-02-09 10:08:18.747 [1] main txn.c:588 W> too long WAL write: 1 rows at LSN 462789: 0.586 sec
2021-02-09 10:08:18.747 [1] main txn.c:588 W> too long WAL write: 1 rows at LSN 462791: 0.542 sec
2021-02-09 10:18:40.923 [1] main txn.c:588 W> too long WAL write: 1 rows at LSN 467313: 0.652 sec
2021-02-09 10:23:10.950 [1] main txn.c:588 W> too long WAL write: 1 rows at LSN 467683: 0.527 sec

Timezone: UTC

After updating to operator version 0.0.9 fixed.

I think this issue #77 is same.