HoloArchivists/fc2-live-dl

Error handling in FC2LiveStream.get_meta

15532th opened this issue · 3 comments

I got an exception in my instance of autofc2 with following stacktrace:

2023-03-18 16:24:24 [fc2 [redacted]] Fetching stream info
Task exception was never retrieved
future: <Task finished name='Task-2733449' coro=<AutoFC2.handle_channel() done, defined at [redacted]/lib/python3.11/site-packages/fc2_live_dl/autofc2.py:90>
 exception=KeyError('data')>
Traceback (most recent call last):
  File "[redacted]/lib/python3.11/site-packages/fc2_live_dl/autofc2.py", line 93, in handle_channel
    await fc2.download(channel_id)
  File "[redacted]/lib/python3.11/site-packages/fc2_live_dl/FC2LiveDL.py", line 97, in download
    await live.wait_for_online(self.params["wait_poll_interval"])
  File "[redacted]/lib/python3.11/site-packages/fc2_live_dl/fc2.py", line 209, in wait_for_online
    while not await self.is_online():
              ^^^^^^^^^^^^^^^^^^^^^^
  File "[redacted]/lib/python3.11/site-packages/fc2_live_dl/fc2.py", line 215, in is_online
    meta = await self.get_meta(refetch=refetch)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "[redacted]/lib/python3.11/site-packages/fc2_live_dl/fc2.py", line 277, in get_meta
    data["data"]["channel_data"]["title"]
    ~~~~^^^^^^^^
KeyError: 'data'
2023-03-30 06:50:59 [fc2 [redacted]] Fetching stream info

Since there is "Fetching stream info" line after stacktrace, I assume handle_channel task got finished after unhandled exception and was restarted by AutoFC2._main, so whole thing would probably continue working fine after that. But as get_meta makes network request, parses and processes response, it would make sense to handle cases where something of that goes wrong.

The response that caused exception must have been a valid json, since it was successfully decoded, so it might be some kind of 50x error, making it related to #10. I tried to trigger different network-related errors by blocking internet access and pointing api url to different places, and result is generally the same: unhandled exception in get_meta causing restart of handle_channel task.

From my understanding, it would make sense to check for non-200 status, catch KeyError (valid json without expected fields), aiohttp.client_exceptions.ContentTypeError (not a valid json), asyncio.TimeoutError (in case of network timeout) or any other exception and raise custom exception with appropriate error text, which could then be caught in FC2LiveDL.download .

FC2LiveStream.wait_for_online seems like a good place to handle exponential backoff from #10, but which of possible errors should cause retry and which should be raised immediately? Also, what should be lower and upper limit on delay before retrying? Since there is wait_poll_interval in wait_for_online context, it would make sense to start with it, but should upper bound involve hardcoded value of 5-10 minutes, be multiple of wait_poll_interval, or both?

Hi, thanks for the detailed report. Sorry it took a while for me to respond, somehow this issue got buried.

Yeah I think we can add some retry and exponential backoff at FC2LiveStream.wait_for_online. For simplicity's sake maybe we can just catch all exceptions apart from KeyboardInterrupt. The exact number for the backoff doesn't matter too much but I usually go from 1 second, doubling until some hardcoded limit. I think 5 minutes is a good number.

Let me know if you'd like to make a PR for this.

Yes, I would make a PR, unless you would find doing it yourself faster than discussing all the details.

In the context of FC2LiveStream.wait_for_online, there is not enough information to provide a meaningful error message, and using a generic one while hiding the stacktrace would make debugging abnormalities substantially harder. I suggest wrapping either every step of get_meta in try-except clause that would print a warning with appropriate details and then re-raise/raise custom exception, which could then be caught in wait_for_online and other places using get_meta.

Ended up doing it in wait_for_online, since it's indeed simpler. Hopefully the way warning about exception is printed will still provide enough information.
Additionally get_meta now calls raise_for_status to get non-200 status code error reported as such.
Perhaps #10 can also be closed now?