Investigate timeouts lag in graph rendering
esune opened this issue · 5 comments
When rendering a dFlow, a significant amount of time is consumed by the request that obtains the graph topology. Investigate the root cause, and determine an appropriate solution.
Additional logging added to von-x
showed that every time there is an "agent hop" (one agent hands off the request to another agent, in order to determine dependencies recursively) there is a significant "pause". The code does not seem to be blocking in any way, so we may need to look at causes in the infrastructure.
Added more logging to von-x
, it confirmed that when a delay happens it is between posting the request to another agent and when the target agent starts processing it.
I reconstructed two scenarios involving three agents, with very different response times: one took 11s to complete, the other 80ms (see attachments).
DNS resolution trace for scenario 1:
2019-01-23 00:17:56,799 INFO [vonx.common.trace]: start request: {'ctx': 'get dependencies', 'url': 'http://dflow:8080/bcreg'}
2019-01-23 00:17:56,800 INFO [vonx.common.trace]: connection create start: {'ctx': 'get dependencies', 'url': 'http://dflow:8080/bcreg'}
2019-01-23 00:17:56,800 INFO [vonx.common.trace]: dns cache miss: {'ctx': 'get dependencies', 'url': 'http://dflow:8080/bcreg'}
2019-01-23 00:17:56,800 INFO [vonx.common.trace]: dns resolution start: {'ctx': 'get dependencies', 'url': 'http://dflow:8080/bcreg'}
2019-01-23 00:18:01,808 INFO [vonx.common.trace]: dns resolution end: {'ctx': 'get dependencies', 'url': 'http://dflow:8080/bcreg'}
2019-01-23 00:18:01,810 INFO [vonx.common.trace]: connection create end: {'ctx': 'get dependencies', 'url': 'http://dflow:8080/bcreg'}
DNS resolution trace for scenario 2:
2019-01-23 00:17:00,011 INFO [vonx.common.trace]: start request: {'ctx': 'get dependencies', 'url': 'http://dflow:8080/agri'}
2019-01-23 00:17:00,011 INFO [vonx.common.trace]: connection create start: {'ctx': 'get dependencies', 'url': 'http://dflow:8080/agri'}
2019-01-23 00:17:00,012 INFO [vonx.common.trace]: dns cache miss: {'ctx': 'get dependencies', 'url': 'http://dflow:8080/agri'}
2019-01-23 00:17:00,012 INFO [vonx.common.trace]: dns resolution start: {'ctx': 'get dependencies', 'url': 'http://dflow:8080/agri'}
2019-01-23 00:17:00,014 INFO [vonx.common.trace]: dns resolution end: {'ctx': 'get dependencies', 'url': 'http://dflow:8080/agri'}
2019-01-23 00:17:00,015 INFO [vonx.common.trace]: connection create end: {'ctx': 'get dependencies', 'url': 'http://dflow:8080/agri'}```
Fixed by this commit: PSPC-SPAC-buyandsell/von-x@c8d91af
Configurations in OpenShift have been updated to reflect the change, and the delay has now disappeared.