🐛 Large data in the store caused the app to crash

Question

🐛 Large data in the store caused the app to crash

Opened this issue 4 years ago · 0 comments

I'm using better-queue in production with a PostgreSQL store.

We had an issue in the service running the queue and some significant data piled up in the store (2 312 233 tasks).

This caused the application to crash 1 min after the start. The crash was completely silent, with no unhandledRejection nor uncaughtException and no signal traps either (SIGTERM, SIGINT, SIGUSR2). The machine resources were not exhausted the CPU was at 40% and very low memory consumption.

I have logs set at the beginning of the queue process and on error events as you can see in the following code

const queueOptions = {
  batchSize: 250000,
  batchDelay: 3600000,
  concurrent: 1,
  maxRetries: Infinity,
  autoResume: false, // I tested with and without the `autoResume`
  retryDelay: 3600000 + 10,
  afterProcessDelay: 3600000,
  precondition: async (cb: any) => {
    try {
      const lock = await cacheInstance.getValue(LOCK_KEY);
      if (lock) {
        logger.info('Precondition failed, resources still locked');
        cb(null, false);
      } else {
        cb(null, true);
      }
    } catch (err) {
      logger.warn('Couldn\'t check the queue precondition', err);
      cb(err);
    }
  },
  preconditionRetryTimeout: 3600000,
};
if (config.env === 'production') {
  // @ts-ignore
  queueOptions.store = {
    type: 'sql',
    dialect: 'postgres',
    host: process.env.DATABASE_HOST,
    port: process.env.DATABASE_PORT,
    username: process.env.DATABASE_USERNAME,
    password: process.env.DATABASE_PASSWORD,
    dbname: process.env.DATABASE_NAME,
    tableName: 'my_queue_store', // The table will be created automatically.
  };
}
const myQueue = new Queue(async (payload: any, cb: any) => {
  try {
    const lock = await cacheInstance.lock(LOCK_KEY, 3600000);
    // await doTheProcessing() and release the lock.
    cb(null, 'queue_processed');
  } catch (err) {
    // Release the lock.
    cb(err);
  }
}, queueOptions );

// Queue logs
myQueue.on('batch_failed', (error: string) => {
  logger.warn(`Failed to process the queue`, error);
});
myQueue.on('batch_finish', () => {
  logger.info(`Processed the queue batch`);
});

also, I have the following logs when I push data to the queue

myQueue.push(payload)
        .on('finish', (result) => {
          logger.verbose(`Pushed an event to the queue`, result);
        })
        .on('failed', (err) => {
          logger.warn(`Failed to push an event to queue`, err);
        });

The absence of logs made it very hard to find the issue, I discovered the issue when disabling the SQL store and using the default memory where the app stopped crashing.

My only solution, for now, was to backup the my_queue_store table and truncate it.

my tech stack is the following:
OS: 64bit Amazon Linux 2/5.2.1 running in EBS
Node version: Node.js 12 running
better-queue: 3.8.2
better-queue-sql: 1.0.3

How can this be avoided?
How to improve logs for a similar situation?

Thank you 💗