[BUG] Request timeouts when under load
Closed this issue · 3 comments
andreaswachs commented
Describe the bug
The client and server might experience request timeouts when deploying lots of labs.
To Reproduce
Steps to reproduce the behavior:
seq 50 | xargs -I{} -P4 dkn schedule -f test_resources/lab.yaml
- money
After a little while, you will see that requests begin to time out and that the leader server might crash:
server-example-1 | {"level":"error","error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded","time":"2023-04-11T10:39:19Z","message":"Failed to get capacity from follower 46.101.157.135:50051"}
server-example-1 | {"level":"info","time":"2023-04-11T10:39:19Z","message":"Scheduling lab on follower 64.226.124.33:50051"}
server-example-1 | panic: runtime error: invalid memory address or nil pointer dereference
server-example-1 | [signal SIGSEGV: segmentation violation code=0x1 addr=0x28 pc=0x93db8a]
server-example-1 |
server-example-1 | goroutine 1449 [running]:
server-example-1 | github.com/andreaswachs/bachelors-project/daaukins/server/service.(*Server).ScheduleLab.func1(0xc000078460)
server-example-1 | /app/service/service.go:214 +0x1ea
server-example-1 | created by github.com/andreaswachs/bachelors-project/daaukins/server/service.(*Server).ScheduleLab
server-example-1 | /app/service/service.go:205 +0x4ee
server-example-1 exited with code 2
Expected behavior
We might not want requests with timeouts due to the eventual consistency nature of the server
andreaswachs commented
Don't include timeout in context sent from client???
andreaswachs commented
PoC antipattern (?): I'll just give no timeout to the requests.
andreaswachs commented
Better solution: the "problematic" rpc's will have significantly longer timeouts (60s?)