Pool2 acquisition failure causes resource to hang around
Closed this issue · 20 comments
The call to acquire can timeout, according to the acquireTimeout value. When Pool2 times out the action and tries again, there is nothing to stop the prior acquire attempt. This can lead to resource leakage. The only solution I can think of is to remove the acquireTimeout completely and handle the timeout myself.
Example of the issue:
let pool = new Pool<tls.ClearTextStream>({
acquire: (cb: (err: any, res?: tls.ClearTextStream) => void) => {
let stream = tls.connect(opts);
stream.once('secureConnect', () => {
cb.removeAllListeners();
cb(null, stream);
});
stream.once('error', (err: Error) => {
cb(err);
});
},
acquireTimeout: 2 * 1000,
min: 2,
max: 10
....
});
If the SSL handshake does not complete within 2 seconds (but completes later), this stream is left open.
Is the only solution here to disable the acquireTimeout and handle the timeout myself within the acquire function?
Yikes, that's a pretty bad oversight on my part. Given you're dealing with streams, it's probably fine to return the stream as acquired immediately, so that there is something to be passed along to the dispose function which can clean it up properly. Streams can accept writes before they are open, generally.
I think we can take a cue from FRP style programming here and allow you to return a function from the acquire callback, which function will handle disposing of/destroying the resource as cleanly as possible. Would that suit you? Essentially, you would, at the end of your acquire function, do something like this:
acquire: (cb: (err: any, res?: tls.ClearTextStream) => void) => {
let stream = tls.connect(opts);
stream.once('secureConnect', () => {
cb.removeAllListeners();
cb(null, stream);
});
stream.once('error', (err: Error) => {
cb(err);
});
return () => stream.destroy();
}
I could then call it on timeouts, and maybe later down the line this can be used as a more general dispose approach, eliminating the separate disposer property. Does that meet your need?
That would work great.
Thanks.
On Thu, Nov 10, 2016, 19:44 Kris Reeves notifications@github.com wrote:
Yikes, that's a pretty bad oversight on my part. Given you're dealing with
streams, it's probably fine to return the stream as acquired immediately,
so that there is something to be passed along to the dispose function which
can clean it up properly. Streams can accept writes before they are open,
generally.I think we can take a cue from FRP style programming here and allow you to
return a function from the acquire callback, which function will handle
disposing of/destroying the resource as cleanly as possible. Would that
suit you? Essentially, you would, at the end of your acquire function, do
something like this:acquire: (cb: (err: any, res?: tls.ClearTextStream) => void) => { let stream = tls.connect(opts); stream.once('secureConnect', () => { cb.removeAllListeners(); cb(null, stream); }); stream.once('error', (err: Error) => { cb(err); }); return () => stream.destroy(); }
I could then call it on timeouts, and maybe later down the line this can
be used as a more general dispose approach, eliminating the separate
disposer property. Does that meet your need?—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#30 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AGfltoFTyFdmOstZ_df1FwYTLQS_4Q4Zks5q87ptgaJpZM4KvKiU
.
I just published a version 1.4, give it a try and let me know. Sorry it took so long, I had just left on vacation!
...you know, it occurs to me that I got this all wrong. This was intentional, if maybe unclear, behavior. The idea was that even if THIS acquire times out, the resource will still be in the pool (as long as it doesn't fail) to serve the next request(s). I unpublished it real quick, but you can still test from master branch if you need. I need to dig a little deeper though.
Is this something you just noticed and thought was a problem, or was it the cause of an acutal bug / problem in your code? As long as your acquire callback does the correct thing, pool2 should "gain" the resource even if it never serves it to the acquire request that spawned it.
You know... That's kind of what I thought it should do anyways.
My max is "10". What I was seeing was 20+ open connections, using "netstat
-o" to see that they were fully open/established, from the process to the
remote service. This was happening when the connection and SSL handshake
was taking longer than the 2 second acquireTimeout (it was taking around 15
seconds).
I've started looking through pool's code to understand why the 20 resources
were allocated when I said the max is 10. I haven't found anything yet. All
my best guesses just showed that you had already coded for that
possibility. If nothing turns up, I'll go back to adding more debugging
output to my application.
Thanks
On Mon, Nov 14, 2016 at 12:49 PM Kris Reeves notifications@github.com
wrote:
...you know, it occurs to me that I got this all wrong. This was
intentional, if maybe unclear, behavior. The idea was that even if THIS
acquire times out, the resource will still be in the pool (as long as it
doesn't fail) to serve the next request(s). I unpublished it real quick,
but you can still test from master branch if you need. I need to dig a
little deeper though.Is this something you just noticed and thought was a problem, or was it
the cause of an acutal bug / problem in your code? As long as your acquire
callback does the correct thing, pool2 should "gain" the resource even if
it never serves it to the acquire request that spawned it.—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#30 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AGfltrpJXLfy09_tw3WvXIsM7kxjWXjpks5q-J8mgaJpZM4KvKiU
.
Okay, that definitely sounds like a bug. Can you reproduce it? Does it go away if you use the master branch as we thought?
Just got a chance to read over the code with this in mind. I had to clarify the various timeouts, but yes -- acquire timeout actually decrements the number of "pending" resources, so this is definitely a problem, and could certainly cause the pool to exceed the maximum. It's uncertain to me yet if it means it will actually leak resources, but as long as the acquire function concludes (either errors or succeeds) I don't think so.
(Also, I apparently didn't re-consider this case at some point, since the code explicitly attempts to remove resources that arrive after a request timed out...)
I can't seem to cause it to leak resources; timed out resources do decrement the internal counter of "pending" resources, but they also dispose of the resource so acquired. As best I can remember, this was a solution-of-least-change approach, rather than the optimal one I described above. I wrote a test to explicitly time out on acquire before the resource was returned, and the result, as long as the resource request completes (late), is that it is subsequently removed/cleaned up.
I think it's maybe time for a rewrite, probably with a promise-based interface now that promises are in core...
Pool 3? I'd be happy to collaborate.
On Wed, Nov 16, 2016, 10:26 Kris Reeves notifications@github.com wrote:
I think it's maybe time for a rewrite, probably with a promise-based
interface now that promises are in core...—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#30 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AGfltulW5FUv58gDB9p-fzZqvaEKg-ikks5q-yCogaJpZM4KvKiU
.
Probably closer to pool2 2.0 :P I'll give some thought to what I'd want it to look like. The callbacks here definitely got more tangled than I originally hoped, and certainly more complex than the original version. I think promises would do a lot, but there are some other things I'd like to make into cleaner abstractions if I can. Meanwhile, if you're able to reproduce the "more resources than you should have" problem, I'd certainly like to try and fix it.
Well... Since I moved my timeout into the acquire callback instead of
having pool2 handle it, I do not have the issue.
On Wed, Nov 16, 2016, 12:11 Kris Reeves notifications@github.com wrote:
Probably closer to pool2 2.0 :P I'll give some thought to what I'd want it
to look like. The callbacks here definitely got more tangled than I
originally hoped, and certainly more complex than the original version. I
think promises would do a lot, but there are some other things I'd like to
make into cleaner abstractions if I can. Meanwhile, if you're able to
reproduce the "more resources than you should have" problem, I'd certainly
like to try and fix it.—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#30 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AGflttwhLKdwb6qW62hJqZq2ZRkc4es4ks5q-zlegaJpZM4KvKiU
.
Might I ask you to share some code / put it back the way it was, so that I can run down the problem and fix it?
Sure. I'll write up a sample and send it over.
On Fri, Nov 18, 2016, 12:10 Kris Reeves notifications@github.com wrote:
Might I ask you to share some code / put it back the way it was, so that I
can run down the problem and fix it?—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#30 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AGfltt9uh6L24bDeWTgQChILjLODwFXEks5q_dwSgaJpZM4KvKiU
.
Awesome, thanks!
Attached is a sample using nothing more than a timeout to make sure the
acquisition completes after the acquire timeout. No streams, just a simple
object.
Note that the issue appears to be worse than I initially reported. Not sure
what's going on, but you'll see what I mean when you run the program.
On Fri, Nov 18, 2016 at 2:13 PM Kris Reeves notifications@github.com
wrote:
Awesome, thanks!
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#30 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AGfltnOrBZzqX_E5dF7rYU5x2JvjKpttks5q_fjQgaJpZM4KvKiU
.
Forgot that attachments don't work here. So, here is the code:
const Pool = require('pool2');
let nextId = 0;
let pool = new Pool({
acquire: (cb) => {
// simulate taking 2 seconds to get a resource
setTimeout(() => {
let id = nextId++;
console.log('[acquiring]',id);
cb(null, {id: nextId++});
}, 2000);
},
acquireTimeout: 1000,
dispose: (res, cb) => {
console.log('[disposing]',res.id);
cb();
},
disposeTimeout: 1000,
requestTimeout: 5000,
min: 2,
max: 10,
syncInterval: 60000,
backoff: {},
bailAfter: Infinity
});
const acquireCb = (err, res) => {
if (err) {
console.log('acquire error:',err);
} else {
console.log('acquired ',res.id);
pool.release(res);
}
};
for (let i = 0; i < 20; i ++) {
pool.acquire(acquireCb);
}
console.log('Finished queuing acquires...');
On Mon, Nov 21, 2016 at 4:38 PM Lucas Lacroix raijinsetsu@gmail.com wrote:
Attached is a sample using nothing more than a timeout to make sure the
acquisition completes after the acquire timeout. No streams, just a simple
object.Note that the issue appears to be worse than I initially reported. Not
sure what's going on, but you'll see what I mean when you run the program.On Fri, Nov 18, 2016 at 2:13 PM Kris Reeves notifications@github.com
wrote:Awesome, thanks!
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#30 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AGfltnOrBZzqX_E5dF7rYU5x2JvjKpttks5q_fjQgaJpZM4KvKiU
.
Fantastic, thanks. Somehow I thought it was a little trickier than that... will investigate.
Apologies for taking so long to dig into this. From your code, I notice three things:
- You double-increment the ID in the acquire callback, so it's going to go up by two at a time
- You set a pool minimum that can never be satisfied because every attempt times out, so it gets stuck in a loop trying to acquire the minimum number of resources
- When the requests time out, new resource acquisitions start, though the previously in-flight resources are still open (they get destroyed when they come in) -- this appears to be the source of the "exceeds maximum" behavior. I appear to have poorly conceptualized this behavior.
I think you chose exactly the correct solution: if what you want is to destroy the resource itself, e.g. terminate a TCP connection or whatever, after an amount of time, that behavior should be in the library you're binding; for example, with an http request, you might set an timeout on the socket and close the socket if it is exceeded. This timeout should trigger an error callback to pool2 itself, thus causing the resource acquisition to fail.
It seems that possibly acquireTimeout is an ill-advised feature. The behavior as written is as intended, but the consequences are not clear. The solution I'm choosing for this issue is to update the documentation to help make it clearer: acquireTimeout is when pool2 will give up on waiting for the resource to arrive (it will be discarded if it arrives late or errors), but in-flight requests will not be capped to the max in this case. You will be guaranteed that only the max number of resources are available to your application at any given time.
Thank you for bringing this to my attention! Please reopen if you determine this is not an adequate solution!
Sounds right.