nci/gsky

gRPC worker issues

Closed this issue · 0 comments

There have been several issues/bugs in the gRPC workers since last major update.

  • The builtin processes do not outperform external processes over unix domain socket connections. This has been verified empirically. For WMS/WCS, a request of 2000x2000 image on average takes about 13 seconds using builtin processes while 12 seconds using external processes. For WPS, a request of 162 dataset files on average takes about 35 seconds using builtin processes while 14 seconds using external processes.
  • https://github.com/nci/gsky/blob/master/worker/gdalprocess/drill.go#L275 Surprisingly, gdal can give negative for offset x and y values. We will need to reset them to zero.
  • Apparently the load balancing for gRPC workers in tile_grpc.go and drill_grpc.go is done in a round robin fashion. Round robin can potentially have issues if each http request has very small number of gRPC calls. By round robin, all of the calls will go to the first few gRPC workers. To fix this, we need to use a random integer as the starting point (i.e. index of the first worker) then round robin starts from there.