fsherratt/ERL_SmartCities_2019

Multicore investigation

Opened this issue · 8 comments

Look into how to do multicore and shared memory stuff in python

Few options:
Multicore in python appears to be a default : https://www.praetorian.com/blog/multi-core-and-distributed-programming-in-python
https://docs.python.org/dev/library/multiprocessing.html#module-multiprocessing.pool
Data sharing for this sounds rather limited (at least, that's what is stated by the pip multicore library)

There is also the multicore pip library:
https://pypi.org/project/multicore/
This seems to be very immature. Appears to be more targeted at databases etc. Also appears to have a problem with large data structures (i.e. the map grid might break this).

Think its probably worth using the inbuilt one. Functions are all quite isolated with interprocess communication so should be fine. Need to look at shared memory though.

Further reading on https://medium.com/@urban_institute/using-multiprocessing-to-make-python-code-faster-23ea5ef996ba makes it seem that us of the Process class is what we need ( sending tasks that are long running) to each separate core.

To quote the top highlight:

Unless you are running a machine with more than 10 processors, the Process code should run faster than the Pool code.

As an idea could we do this with sockets? Setup known ports etc. and have smart hub check if there is bindings etc.
Will allow it to dynamically configure it and just loop to check reads?

This gives a nice overview of how to do interprocess communication:
https://docs.python.org/3/library/ipc.html

I've worked out how multiprocessing.Array works with objects and multiple cores now. Think that should be fine (well its working fine for me).

Did think it might make sense to just copy the array to the current thread for the interpolation as it allows us to just leave one thread/core purely for pulling data from the camera and updating the array.

I think a service Manager could be good (https://docs.python.org/3.8/library/multiprocessing.html) however there is more overhead than shared memory. Need to test the overhead effect but might be good to make a manager for each class and then each type of item that needs to be accessed can be assigned there. Also allows us to define generic objects:
https://stackoverflow.com/questions/3671666/sharing-a-complex-object-between-python-processes
https://stackoverflow.com/questions/11951750/sharing-object-class-instance-in-python-using-managers
https://docs.python.org/3/library/multiprocessing.html#proxy-objects

This level of overhead isn't going to be an issue.

The fastest aspect will be the 200Hz position data, but nothing else is going to operate anywhere close to this. Mapping will probably be the next fastest at maybe 10Hz.

Probably worth using managers for the surrounding class and just exposing all the items we may want as processes then. Would be a nice way to deal with it I.M.O

So just checking I understand. Each of our main items runs as a process which exposes its public data through a manager. Then within that process there can be any threads/objects that are needed by that process?

For example the pixhawk process has a public interfaces for getting/setting data such as position. Then within that process the MAVLinkThread code is run to make a serial connection