noxdafox/pebble

pebble process pool kills main process [on linux only]

ScientiaEtVeritas opened this issue · 9 comments

I'm trying to run a function in a process pool with timing out after 5 seconds (in an async environment).

    with pebble.ProcessPool(1) as pool:
         future = pool.schedule(func, args=[foo], timeout=5)
         async_future = loop.run_in_executor(None, future.result, 5)
         result = await asyncio.wait_for(async_future, 5)

On Windows this works like a charm. But on Linux (I have tried it with Debian and Ubuntu) the main process (the whole application) gets killed – indifferent of the function taking more or less than 5 seconds. There is no output or error message when the process exits. Is this a bug or am I missing something?

  • Pebble Version: 4.5.3
  • Ubuntu 18.04.2 LTS
  • Debian GNU/Linux 9.12

Hello,

I never tried pebble together with asyncio. I will give it a spin as soon as I have some spare time and come back to you.

Thanks for reporting this!

I could not reproduce your issue. Please provide a minimum reproducible example of your issue.

I tried the following code.

import time
import asyncio
from concurrent.futures import TimeoutError

import pebble


def func():
    time.sleep(10)
    return 10


async def coro_func():
    loop = asyncio.get_running_loop()

    with pebble.ProcessPool(1) as pool:
        future = pool.schedule(func, timeout=5)
        async_future = loop.run_in_executor(None, future.result, 5)

        return await asyncio.wait_for(async_future, 5)


def main():
    try:
        asyncio.run(coro_func())
    except TimeoutError:
        print("Timeout error when running `func`")

    print("Main process alive")


if __name__ == '__main__':
    main()

And it works as expected:

$ python3 issue_66.py 
Timeout error when running `func`
Main process alive

This on Debian with both Python 3.7 and 3.8.

@noxdafox Thank you for looking into it! Your example also works fine for me.

Then I'll probably need to add a bit of context. I'm using it in conjunction with discord.py. I'm not sure in which way it changes the circumstances. If it's of any help, I can provide an MVE with it.

main.py

from discord.ext import commands

if __name__ == "__main__":
    bot = commands.Bot('!?')
    bot.load_extension("example")
    bot.run(TOKEN)

example.py

import asyncio
import pebble
import time
from discord.ext import commands
from concurrent.futures import TimeoutError

def func():
    time.sleep(10)
    return 10

class ExampleCog(commands.Cog):

    @commands.command()
    async def cmd(self, ctx, *, content):
        loop = asyncio.get_running_loop()

        try:
            with pebble.ProcessPool(1) as pool:
                future = pool.schedule(func, timeout=5)
                async_future = loop.run_in_executor(None, future.result, 5)
                return await asyncio.wait_for(async_future, 5)
        except TimeoutError:
            print("Timeout error when running `cmd`")

def setup(bot):
    bot.add_cog(ExampleCog(bot))

The Python version I'm using is 3.7.9.

The invocation of the cmd command ends the main process on Linux.

Pebble raises concurrent.futures.TimeoutError and not TimeoutError. This might be the cause for your problem as you are capturing TimeoutError which is different from concurrent.futures.TimeoutError.

In [3]: import concurrent.futures

In [4]: concurrent.futures.TimeoutError is TimeoutError
Out[4]: False
In [6]: try:
   ...:     raise concurrent.futures.TimeoutError("Timeout")
   ...: except TimeoutError:
   ...:     print("Error handled")
   ...:     
---------------------------------------------------------------------------
TimeoutError                              Traceback (most recent call last)
<ipython-input-6-0eb2c827a138> in <module>()
      1 try:
----> 2     raise concurrent.futures.TimeoutError("Timeout")
      3 except TimeoutError:
      4     print("Error handled")
      5 

TimeoutError: Timeout

Pebble raises concurrent.futures.TimeoutError and not TimeoutError. This might be the cause for your problem as you are capturing TimeoutError which is different from concurrent.futures.TimeoutError.

Oops, sorry! That's a careless mistake in the MVE. In my non-MVE code I have except (concurrent.futures.TimeoutError, multiprocessing.context.TimeoutError).

When catching it, it's still ending the main process. But I also see it printing Timeout error when running cmd before doing so (confirming that it's successfully captured). Moreover, there's an additional layer of exception handling as discord.py captures any exceptions that may happen in a command such that the process won't be affected by it.

Note that when time.sleep is less than 5 seconds (i.e., below the timeout value), then no error will be raised, no message will be outputted and the process will still be killed afterwards. So it seems independent of any potential timeouts and Timeout errors.

@noxdafox Hello, I have the same issue. This is a minimal working example:

  1. pip install blacksheep uvicorn pebble (blacksheep is a ASGI framework)
  2. Run this code
import asyncio
import time

from blacksheep.messages import Response
from blacksheep.server import Application
from pebble import ProcessPool

app = Application()


def function():
    time.sleep(5)


@app.router.get('/')
async def home():
    with ProcessPool(max_workers=1) as pool:
        future = pool.schedule(function, timeout=3)
        try:
            await asyncio.wrap_future(future)
        except asyncio.exceptions.TimeoutError:
            print('Caught')

    return Response(204)


if __name__ == '__main__':
    import uvicorn

    uvicorn.run(app, port=8080)
  1. Go to http://127.0.0.1:8080/ in your browser
  2. See this in the log
INFO:     Started server process [101591]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://127.0.0.1:8080 (Press CTRL+C to quit)
Caught
INFO:     Shutting down
INFO:     127.0.0.1:40686 - "GET / HTTP/1.1" 204 No Content
INFO:     Finished server process [101591]
INFO:     ASGI 'lifespan' protocol appears unsupported.

Process finished with exit code 0

I also want to give a short update, I found some kind of solution or workaround for me.

The presumption is that pebble is propagating signals to the parent process for some reason while the discord.py library implemented a custom signal handling in their bot.run() method. I guess it's similar in your case @yakimka, with uvicorn.run().

So, my approach is to disable the signal handling of discord.py by connecting and running the application on myself.

async def main():
  bot = ...
  try:
    await bot.login(token)
    await bot.connect()
  finally:
    await bot.close()

asyncio.run(main())

I did some research and came to the conclusion that this is not a problem with the Pebble.
This is a known behavior of asyncio when os.fork() used. link1 link2.

Therefore we can't fully "fix" this, but we have some workarounds:

In Python 3.7, you'll have two fixes available in ProcessPoolExecutor (*):

  • either pass an initializer function that resets signal configuration to a sane default state
  • or pass a "forkserver" multiprocessing context that will avoid inheritance issues in the process pool workers

My example can be fixed with:

import asyncio
import signal
import time

from blacksheep.messages import Response
from blacksheep.server import Application
from pebble import ProcessPool

app = Application()

# restore SIGTERM handling in new process
original_sigterm_handler = signal.getsignal(signal.SIGTERM)
def init():
    signal.signal(signal.SIGTERM, original_sigterm_handler)

    # Or we can just ignore signal
    # signal.signal(signal.SIGTERM, signal.SIG_IGN)


def function():
    time.sleep(5)


@app.router.get('/')
async def home():
    with ProcessPool(
            max_workers=1,
            # pass initializer
            initializer=init
    ) as pool:
        future = pool.schedule(function, timeout=3)
        try:
            await asyncio.wrap_future(future)
        except asyncio.exceptions.TimeoutError:
            print('Caught')

    return Response(204)


if __name__ == '__main__':
    import uvicorn

    uvicorn.run(app, port=8080)

So this issue can be closed.

Thanks @yakimka for the investigation and sorry for the late reply. Closing this issue.