RedHatInsights/insights-upload

Do not use bare excepts

Glutexo opened this issue · 4 comments

As an ancient proverb says:

Exceptions are not Pokémon: you don’t have to catch ’em all.

Actually, PEP 8 actually warns about this. The reason is that in most cases you don’t want to catch every possible error, but just a rather specific subset of them.

This actually happened in the upload method that calls a storage driver’s _write_function. When this function raises an exception, it is caught by a generic except Exception and logged not causing the program to crash. The problem is that there is a bug in the localdisk driver and an exception is not raised because the write fails, but because of a bug in the code. In this case ValueError is not something we want to catch.

Please, don’t use bare excepts (except: and except Exception:) and catch a specific type instead. If necessary, create a new error class derived from Exception, convert the specific error to this one as close to the place when it rises as possible and then catch this type instead of catching ‘em all.

Each storage driver will probably raise different exceptions. We probably need the driver itself to catch what it considers is an "expected error" and re-raise a custom Exception like... StorageError

But I would also say, we probably want to catch (and log) more exceptions here than not, because we don't want the service to die when a storage.write hits an error.

I like the idea of rethrowing as StorageError. 👍

As for catching and logging other exceptions, I am not convinced yet. What is a running service good for if the driver is broken and can’t handle the upload properly, possibly leaving the app/storage in an inconsistent state. If we want to prevent the app from dying, keeping the web-server running, I have a feeling that this should be more generic, possibly even in the main function and not just a one-shot except in one method in the UploadHandler. An unrecoverable exception (probably an implementation bug) can arise everywhere and not just in a storage driver.

Yeah my main thought behind that statement was: If we're relying on an external 3rd party system to push uploads, we have to expect that it could break intermittently or be flaky, so IMHO I wouldn't want to "panic" on every single connection timeout, socket error, cert error, http error, etc. that a driver could run into.

You raise a good point about leaving things in a consistently non-working state. Personally I think tracking metrics and throwing alerts when failing uploads hits a certain threshold could be a good approach for this. That could allow you to automate processes like "hey, I see s3 storage write is failing a lot, let's automatically re-deploy the service to switch over to azure"

I've also thought it'd be nice if a client could know for sure whether it's upload was successfully processed or not. So in other words, we could return a 202 when the file is pushed with a reference to a "status URL" which could be polled to track whether the upload to the back-end was successful and the messages were pushed onto the message queue, etc. etc. (but that's for another discussion I guess... :P)

This can be actually solved by correct implementation of the driver:

Every place that can possibly fail in a non-destructive way can catch the respective error type that we don’t want to panic on and rethrow a recoverable StorageError. This is logged and the program continues. If something really unexpected happens, like ValueError because of a bad return value of our routine or a third-party library, it wouldn't be caught and 💥. And that‘s a right thing to do from my point of view.

I‘d also believe that most of the non-panic common errors (probably mostly IO) are already catchable from one place in the driver by a single except with only a little generic ancestor. Something like IOError, HTTPRequestError or whatever. But that‘s only an assumption. Still we’d probably don’t want to panic on errors ≥ 500 (server errors), but want to on 403 (bad auth), so we‘d still need to have some logic in the except block.

Here’s a little dummy example:

import s3_lib
class S3:
    def upload(payload):
        try:
            result = s3_lib.upload(payload)
        except s3_lib.CommunicationError as e:
            # Connection lost, service unavailable…
            raise StorageError("Upload to S3 failed.") from e
        return result["uri"]  # May fail with IndexError, if there’s a bug in in s3_lib.upload or our assumption of return value is wrong.

try:
    s3 = S3()
    s3.upload("my data")
except StorageError as e:
    logger.exception(e)
    # Show must go on
# Any other error not caught.

As for the better response codes depending on upload success, that’s definitely a good idea. Create a ticket for that? 👍