Gridmap with large output objects
bjanssen opened this issue · 1 comments
When large output objects are generated by the jobs that are run on a cluster node, JobMonitor
is unable to keep up with the flow of incoming information. The consequence is that JobMonitor
is unable to keep all nodes running at maximum capacity (since the nodes are waiting to get their data sent back to the node where JobMonitor
is running), leading to an aggregated load pattern of the cluster as shown in the figure below:
In a synthetic benchmark, where each job returns a large NumPy array I found that running zdumps
(as implemented in data.py
) on my result takes approximately 1.78 seconds. If I disable bz2 compression this time is reduced to 140ms. This would be a bandaid for my specific situation, but does not solve the problem an sich.
While looking at the GridMap code I was wondering if it would be a lot of work to have a parallel version of JobMonitor. The reason for this is that a "slow result" would not immediately block retrieval of other incoming results.
Apparently it is possible to directly send large NumPy arrays over ZMQ without going through the effort of serializing it through pickle. This might help as well, although I do not directly see how to modify job.py
such that it will know when to expect a "special" NumPy result packet.
I think the only reason we're using bz2 compression is that that was what the original pythongrid (which GridMap is forked from) did it that way.
I hadn't thought about the problem of needing to send back large objects before. Unfortunately, one of the reasons PyZMQ is so easy to use is that it is inherently single-threaded, so there's no way to parallelize JobMonitor by allowing for more threads/processes. What we could do, would be to break things up into multiple messages somehow, so that the the JobMonitor reconstructs the data piecemeal. That way it's never spending all of its time receiving one particular message.
As for adding some functionality so that numpy arrays are treated specially, that's something I'd be open to, but keep in mind that the non-copying stuff talked about in the PyZMQ documentation doesn't apply if you're sending things from one machine to another.