andreaswachs/bachelors-project

[BUG] Request timeouts when under load

Closed this issue · 3 comments

Describe the bug
The client and server might experience request timeouts when deploying lots of labs.

To Reproduce
Steps to reproduce the behavior:

  1. seq 50 | xargs -I{} -P4 dkn schedule -f test_resources/lab.yaml
  2. money

After a little while, you will see that requests begin to time out and that the leader server might crash:

server-example-1  | {"level":"error","error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded","time":"2023-04-11T10:39:19Z","message":"Failed to get capacity from follower 46.101.157.135:50051"}
server-example-1  | {"level":"info","time":"2023-04-11T10:39:19Z","message":"Scheduling lab on follower 64.226.124.33:50051"}
server-example-1  | panic: runtime error: invalid memory address or nil pointer dereference
server-example-1  | [signal SIGSEGV: segmentation violation code=0x1 addr=0x28 pc=0x93db8a]
server-example-1  | 
server-example-1  | goroutine 1449 [running]:
server-example-1  | github.com/andreaswachs/bachelors-project/daaukins/server/service.(*Server).ScheduleLab.func1(0xc000078460)
server-example-1  | 	/app/service/service.go:214 +0x1ea
server-example-1  | created by github.com/andreaswachs/bachelors-project/daaukins/server/service.(*Server).ScheduleLab
server-example-1  | 	/app/service/service.go:205 +0x4ee
server-example-1 exited with code 2

Expected behavior
We might not want requests with timeouts due to the eventual consistency nature of the server

Screenshots
Screenshot_2023-04-11_12-44-02

Don't include timeout in context sent from client???

PoC antipattern (?): I'll just give no timeout to the requests.

Better solution: the "problematic" rpc's will have significantly longer timeouts (60s?)