bcgov/greenlight

Investigate timeouts lag in graph rendering

esune opened this issue · 5 comments

esune commented

When rendering a dFlow, a significant amount of time is consumed by the request that obtains the graph topology. Investigate the root cause, and determine an appropriate solution.

esune commented

Additional logging added to von-x showed that every time there is an "agent hop" (one agent hands off the request to another agent, in order to determine dependencies recursively) there is a significant "pause". The code does not seem to be blocking in any way, so we may need to look at causes in the infrastructure.

esune commented

Added more logging to von-x, it confirmed that when a delay happens it is between posting the request to another agent and when the target agent starts processing it.

I reconstructed two scenarios involving three agents, with very different response times: one took 11s to complete, the other 80ms (see attachments).

esune commented

DNS resolution trace for scenario 1:

2019-01-23 00:17:56,799 INFO [vonx.common.trace]: start request: {'ctx': 'get dependencies', 'url': 'http://dflow:8080/bcreg'}
2019-01-23 00:17:56,800 INFO [vonx.common.trace]: connection create start: {'ctx': 'get dependencies', 'url': 'http://dflow:8080/bcreg'}
2019-01-23 00:17:56,800 INFO [vonx.common.trace]: dns cache miss: {'ctx': 'get dependencies', 'url': 'http://dflow:8080/bcreg'}
2019-01-23 00:17:56,800 INFO [vonx.common.trace]: dns resolution start: {'ctx': 'get dependencies', 'url': 'http://dflow:8080/bcreg'}
2019-01-23 00:18:01,808 INFO [vonx.common.trace]: dns resolution end: {'ctx': 'get dependencies', 'url': 'http://dflow:8080/bcreg'}
2019-01-23 00:18:01,810 INFO [vonx.common.trace]: connection create end: {'ctx': 'get dependencies', 'url': 'http://dflow:8080/bcreg'}

DNS resolution trace for scenario 2:

2019-01-23 00:17:00,011 INFO [vonx.common.trace]: start request: {'ctx': 'get dependencies', 'url': 'http://dflow:8080/agri'}
2019-01-23 00:17:00,011 INFO [vonx.common.trace]: connection create start: {'ctx': 'get dependencies', 'url': 'http://dflow:8080/agri'}
2019-01-23 00:17:00,012 INFO [vonx.common.trace]: dns cache miss: {'ctx': 'get dependencies', 'url': 'http://dflow:8080/agri'}
2019-01-23 00:17:00,012 INFO [vonx.common.trace]: dns resolution start: {'ctx': 'get dependencies', 'url': 'http://dflow:8080/agri'}
2019-01-23 00:17:00,014 INFO [vonx.common.trace]: dns resolution end: {'ctx': 'get dependencies', 'url': 'http://dflow:8080/agri'}
2019-01-23 00:17:00,015 INFO [vonx.common.trace]: connection create end: {'ctx': 'get dependencies', 'url': 'http://dflow:8080/agri'}```
esune commented

Fixed by this commit: PSPC-SPAC-buyandsell/von-x@c8d91af
Configurations in OpenShift have been updated to reflect the change, and the delay has now disappeared.