ethersphere/gateway-proxy

Autoextend crash and stuck stamps

Cafe137 opened this issue · 4 comments

A top-level try-catch is missing which causes app crash if proxy is started before Bee, or network connection is unstable

Upon error, stamp is removed from extendingStamps only in one branch in catch clause

if (e && e.responseBody && e.requestOptions && JSON.parse(e.responseBody).code === 429) {
    const errorStamp = e.requestOptions.path.split('/')[2]
    const errorStampIndex = this.extendingStamps.indexOf(errorStamp)
    this.extendingStamps.splice(errorStampIndex, 1)
    logger.warn(`postage stamp warning ${errorStamp}`)
} else {
    logger.error('failed to topup postage stamp', e)
}

I think the first branch hides some info, looking at logs only I wouldn't be able to tell what is meant by warning. For simplicity I'd merge the two and just do something like:

const errorStampIndex = this.extendingStamps.indexOf(stamp.batchID)
this.extendingStamps.splice(errorStampIndex, 1)
logger.error('failed to topup postage stamp', e)

The other issue here was that in the else block, the stamp is not removed from extendingStamps array, and will not be updated ever again I think.

The other problem is that the refreshStampsExtends function needs error handling, otherwise a connectivity issue would crash the whole app. So I suggest wrapping it simply in a try catch and logging when there is an error: logger.error('failed to refresh postage stamp', error)

And a minor wording issue in the logging: 'successfully postage stamp extended' would sound much better as 'successfully extended postage stamp'

As a simple test for most systems in gateway-proxy (content reupload, stamp management), I suggest starting gateway-proxy without Bee running, and stopping Bee while gateway-proxy is still running, waiting for errors to pop up, and then restarting Bee, to see if error handling is done well and gateway-proxy is robust and can recover.

The health and readiness checks make sure the proxy is not serving requests during this time, but hard crashes on network issues are a bit nasty 🙂