halayli/lthread

select?

Closed this issue · 14 comments

A way to poll on multiple fds at once would be nice.

select is some old piece of crap, is anybody from this era even using it? The api itself is weird, plus this terrible FD_SETSIZE limitation...
Anyway, I've implemented lthread_poll() in my fork, and I got it working with libmysqlclient (actually mariadb, but it's irrelevant) and gsoap with openssl. Test and an example are in stock :-)

Yes, I merely meant a select-like method. lthread_poll() sounds great.

How about merging this function from your fork? And maybe some of your other changes too...

Planning on it, but kinda limited on time now. If the author will agree to merge, virtually everything should be done.

lthread scheduler already uses a poller. Having more than one poller in a scheduler hints that it's the wrong solution to the problem. I'd much rather have libraries like libcurl/mysqlclient provide a way to call lthread_recv/send directly.

In coroutine approach, 1 coroutine listens to 1 fd at a time. If you need to listen to more than one fd, create another coroutine and have coroutines coordinate using sync mechanisms like lthread_cond.

lthread scheduler already uses a poller. Having more than one poller in a scheduler hints that it's the wrong solution to the problem.

I presume you didn't saw the implementation. My lthread_poll() uses the stock lthread poller. No kludges.

I'd much rather have libraries like libcurl/mysqlclient provide a way to call lthread_recv/send directly.

Damn, it would be cool, but I ain't living in an ideal world. Are you seriously impling me to inspire the oracle and any-other-library-I-might-need developers to implement a feature just because it would be convenient to me that way? I think the power should be in a single developer hands if possible (and it is in that case), or there won't be any progress in the foreseeable future. Especially if it doesn't require much effort, like in our case with integrating foreign libraries with lthread. But on the other side, openssl provides a way to substitute socket api (and other) functions, thus requiring zero code changes to be used with lthread - that's cool, but it's more like an exception among the host of existing libraries. By the way, third party libraries like libmysqlclient require poll() obligatory, which original lthread just can't cover even if function substitution was possible by the library api.

If you need to listen to more than one fd, create another coroutine and have coroutines coordinate using sync mechanisms like lthread_cond.

I can see so much complexity over a thing that might be achieved via a much more simple solution, which produces cleaner code, with better readability and less context switches; this complications no longer required in my fork. You just can poll on any number of fds you want, and it feels right I must admit :] IMO, it really depresses a man when he can't do such a simple thing...

I've had a multithreaded server before, and after the switch to lthread the performance highly increased; without lthread_poll() it wouldn't be possible, because any real program uses foreign libraries, and they use poll() ;-) So I'm trying to say here that having poll is a very usefull thing and not a misconception.

It would be cool if you'll review the implementation and tell me what you think of it. If you'll have any questions, ask me directly.

Ah I see what you mean. I thought you implemented a kqueue/epoll poller inside lthread kqueue/epoll poller. I took a quick peek at it and it looks great. I'll review it tomorrow in detail and give you my feedback.

I started a new branch lthread_poller and implemented lthread_poll. It still needs testing which I'll be doing this week.

https://github.com/halayli/lthread/tree/lthread_poller/src

Why don't you just take already working and tested implementation?

For several reasons:

  1. It isn't correct. After you schedule the fds and sleep, you don't cancel the events that didn't trigger. So if they later trigger they lthread will run again. Let's say you have 5 fds scheduled and 1 triggers, you need to cancel the other 4.
  2. It breaches the isolation of socket and scheduling modules. lthread_sched.c doesn't need to know about pollfd struct.
  3. I don't want to introduce a hash module, yet. I am reusing the rbtree module.
  4. very intrusive changes in the scheduler.
  5. While not a reason at all, style doesn't match which makes it difficult to merge.

These are the changes so far to get it implemented: master...lthread_poller

Changes you made is not enough. At least, you don't fill struct pollfd *fds with the results. And there is more - look at features SOCKET_PASSTHROUGH & SOCKET_TIMEO and socket api functions with _posix postfixes, and whole lot of minor fixes all over the code. I've also added a possibility to run multiple lthread schedulers in different threads of the same process. It's all in README anyway.

Implementation is definitely correct, you got it wrong or missed some pieces. I don't use rbtree for poll, so I don't have to call _lthread_desched_event() as you do. Though I got the necessary calls to cancel the events in poller:

        /* _lthread_desched_events() */
        if (fds[i].events & POLLIN) {
            ev = LT_EV_READ;
            poller_ev_clear = _lthread_poller_ev_clear_rd;
        } else if (fds[i].events & POLLOUT) {
            ev = LT_EV_WRITE;
            poller_ev_clear = _lthread_poller_ev_clear_wr;
        } else {
            assert(0);
        }
        /* <skipped> */
        poller_ev_clear(fds[i].fd);

So the point 1 is false. Point 2 is true, but is very easy to fix as the point 5.
Point 4 - let's just look at the code:

diff --git a/src/lthread_sched.c b/src/lthread_sched.c
index 9995b8d..aeb9c65 100644
--- a/src/lthread_sched.c
+++ b/src/lthread_sched.c
@@ -33,6 +33,7 @@
 #include <unistd.h>
 #include <assert.h>
 #include <pthread.h>
+#include <poll.h>
 #include <errno.h>
 #include <inttypes.h>

@@ -146,7 +147,8 @@ _lthread_sched_isdone(struct lthread_sched *sched)
     return (RB_EMPTY(&sched->waiting) &&
         LIST_EMPTY(&sched->busy) &&
         RB_EMPTY(&sched->sleeping) &&
-        TAILQ_EMPTY(&sched->ready));
+        TAILQ_EMPTY(&sched->ready) &&
+        cfuhash_num_entries(sched->waiting_multi) == 0);
 }

 void
@@ -199,9 +201,11 @@ lthread_run(void)
             p = --sched->num_new_events;

             fd = _lthread_poller_ev_get_fd(&sched->eventlist[p]);
+            if (INVALID_SOCKET(fd)) /* skip already processed sockets by _lthread_desched_events() */
+                continue;

             /*
-             * We got signaled via trigger to wakeup from polling & rusume file io.
+             * We got signaled via trigger to wakeup from polling & resume file io.
              * Those lthreads will get handled in step 4.
              */
             if (fd == sched->eventfd) {
@@ -213,6 +217,16 @@ lthread_run(void)
             if (is_eof)
                 errno = ECONNRESET;

+            /*
+             * per-lthread multiple events handling: all fds belonging to signle lthread
+             * will be handled at 1 invocation of _lthread_desched_events()
+             */
+            lt = _lthread_desched_events(fd);
+            if (lt) {
+                _lthread_resume(lt);
+                continue;
+            }
+
             lt_read = _lthread_desched_event(fd, LT_EV_READ);

Twelve lines of code added, including comments... I can call it anything but intrusive.
Summarizing all the above - I can see no conscious reasons to not accept the changes, except 2 & 5 which is easy to fix.

Any update on this?

I can only recommend you using my branch, it is almost up to date with upstream. The code is stable and fast, works in production 24/7. The original author's version is still incomplete inside the devel branch.
P.S. ... plus my repo got the #29 fixed, which the upstream author refuse to spot.

Implemented and pushed the changes to master. (http://lthread.readthedocs.org/en/latest/socket.html#lthread-poll)