Timeouts calling search API
Closed this issue · 9 comments
This really shouldn't be happening due to colocation, to investigate (some may be art source)
Internal examples:
- /search or message search
- https://discord.com/channels/381294999729340417/896778500013449266/1069445272893132851
- https://discord.com/channels/381294999729340417/896778500013449266/1069427921925714010
- https://discord.com/channels/381294999729340417/896778500013449266/1069382527803994263
- https://discord.com/channels/381294999729340417/896778500013449266/1069299733597790279
- https://discord.com/channels/381294999729340417/896778500013449266/1069299732633100339
- https://discord.com/channels/381294999729340417/896778500013449266/1069299731349655642
- https://discord.com/channels/381294999729340417/896778500013449266/1068944040164331670
- 2023-01-27T21:34:15.013Z Error: Client network socket disconnected before secure TLS connection was established
- /art
- /rush-duel
Example for message 1071682782721495130
Received 2023-02-05T06:44:46.364Z
Timed out 2023-02-05T06:44:48.871Z
timings: {
start: 1675579486366,
socket: 1675579486366,
lookup: 1675579486366,
connect: 1675579486366,
secureConnect: undefined,
upload: 1675579486366,
response: undefined,
end: undefined,
error: 1675579488869,
abort: undefined,
phases: {
wait: 0,
dns: 0,
tcp: 0,
tls: undefined,
request: 0,
firstByte: undefined,
download: undefined,
total: 2503
}
}
Service received request
2023-02-05T06:44:46.462216000Z {"level":30,"time":1675579486420,"pid":1,"hostname":"d11f6ccbdbb0","reqId":"req-81hj"
Many requests before and after get a response in under half a second. No response is ever logged for this one.
Possible workaround for message search: early timeout and then retry a couple times with exponential backoff.
Since the last deployment https://github.com/DawnbrandBots/bastion-bot/actions/runs/4901960311 (about the past week)
docker logs bastion-bot_bot_1 2>&1 | grep 'RequestError: Timeout' > timeouts.log
grep 'error:events:message:search' timeouts.log | wc -l
823 incidents proceeded to full timeout
1046 at least four retries
1391 at least three retries
1972 at least two retries
3441 at least one retry
140 non-message search or Yugipedia timeouts
$ grep -ve 'events:message:search' -e 'command:yugi' timeouts.log | wc -l
140
$ grep -ve 'events:message:search' -e 'command:yugi' timeouts.log | cut -d' ' -f2 | sort | uniq
bot:error:command:art
bot:error:command:deck
bot:error:command:id
bot:error:command:random
bot:error:command:rush
bot:error:command:search
bot:warn:command:rush
40 art, 4 deck, 1 id, 24 random, 17 search, 9 rush (45 autocomplete timeout warnings)
For message search, reverting to old 2.5 second timeout with exponential backoff provided by the library. Also will be adjusting some other commands accordingly.
Quiet week, resolving