Event Driven Tasks
leebaok opened this issue · 11 comments
Event Driven Tasks Arch discussion of OS practice course.
this is my account
Hi, Haonan Li, I am interesting in some tools for distributed application:
- compactor : actor library of python, https://github.com/wickman/compactor
- celery : distributed task queue, http://www.celeryproject.org
And I mentioned the two tools in the email.
I want to know the running entity of their actors or tasks, Process? Thread? Coroutine?
In other words, I want to know how they work.
So, can you read some stuff of them and tell me what I want to know ?
For compactor, methods are registered in "Process", Processes are bound to "Context".
Each context is a thread and has an [ip, port] pair. Each context creates an HTTPServer of tornado and listen to ip:port. Once the server received a request, the context can deal with the request by call_back functions added when binding process before.
Process isn't process in operating systems. process is more likely a container of methods. When process is bound to context, it gets a virtual PID forms like name@ip:port. Compactor can send the PID with message data to ip:port, and that context can find the process by name.
For summary, there are machines linked by internet. Each machine run one or more compactor applications. Each application owns some threads. Each thread has an HTTPServer listen to ip:port, and deal with request. Request contains "PID" to specify "Process" and the message of method name, so the server can find which method to call if the "Process" and method exist.
Thanks. I got it.
So, in compactor, context is a threading of operating system. And it contains many compactor processes. The process of compactor is like an actor that receives messages and sends messages. If one process is blocked, the context will be blocked. So compactor processes should call non-blocking APIs.
Oh, it is powerless and just like a toy.
For celery, I have read some stuff of it. It is more powerful. The running entity can be thread, process or eventlet. But it is a little complex.
( picture from http://www.slideshare.net/duydo/celery101?qid=f7535670-60e3-4615-ad4e-8e1ded372abb&v=&b=&from_search=15)
It is to distribute tasks and one task still runs on one host.
What we want is to distribute distributed tasks which means one task runs on multi hosts.
So you don't need to learn celery. We will implement our own task manager.
Have you heard of ZeroMQ and coroutine ?
Maybe we will use these to implement our distributed task driver.
The good news is python 3.5 supports coroutine with async/await. Below are some stuffs:
- https://www.python.org/dev/peps/pep-0492/ PEP-492
- https://www.python.org/dev/peps/pep-3156/ PEP-3156
- https://docs.python.org/3/library/asyncio.html asyncio
I am reading these stuffs. If you have time, you can learn these. And maybe you need to learn generator/yield/yield from before.
I know coroutine but I haven't heard of ZeroMQ.
I will read these stuffs.
@leebaok
I think when process in compactor is blocked, the context can do other works, but I haven't test it's behavior. Because the methods are registered in an event loop, and after I read some stuffs of event loop I found that the event loop will handle the blocking operations and make other tasks run.
Yes, if one process is blocked, other processes in the context still can run. But if one process uses blocking IO API, the whole context will waiting for the IO operation and other processes in this context cannot run.
This is the common point of corotine -- cooperative schedule, or non-preemptive schedule. So coroutine should always use non-blocking IO APIs. We should put more attention on programming when we use coroutine.
Hello, new task for you.
We want to know the performance of asyncio framework. So please :
- implement a simple http server in asyncio, multi-thread/thread-pool and multi-process/process-pool
- use a client in multi-thread or multi-process to test the performance(throughput and latency) of http server
Hope to see the result soon !
I create a new repository for distributed event management : https://github.com/leebaok/DistGears.git
If you complete the task before, please send pull request to the repository of DistGears.
Thanks.
I have made pull request to DistGears.git yesterday, and mentioned some work not finished in that request.
Now I have finished the last server of process-pool and want to make new pull request.
Shall I wait for the previous pull request merged then make new pull request?
The result is about 600/s for process-pool server and I think it will be n/2 better than others when running on a n-core machine.
For summary, now it is:
thread, thread-pool, asyncio : about 1100 requests per second
process-pool : about 600
process forking : about 150