lazywithclass/winston-cloudwatch

Sequence Logs

Opened this issue · 10 comments

The plugin is throwing error on sequence of logs. Given the below commands in sequence and I can see one entry in cloud watch.
winston.error('exception details');
winston.info('information details',null);

Exception details:
{ [OperationAbortedException: A conflicting operation is currently in progress against this resource. Please try again.]
message: 'A conflicting operation is currently in progress against this resource. Please try again.',
code: 'OperationAbortedException',
time: Tue Jan 06 2015 10:24:55 GMT-0500 (Eastern Standard Time),
statusCode: 400,
retryable: false,
retryDelay: 30 } 'OperationAbortedException: A conflicting operation is currently in progress against this resource. Please try again.

Thank you for reporting this. I am looking into it.

I've seen that this is happening because the two operations are done one right after the other, it seems that this is not allowed by the AWS API.

I will work on a solution later today.

I've written a quick fix to that issue. You should update to 0.1.2 to get it.

I still have to make sure that the issue is not showing anymore even with lots of logs, I will test it with a higher volume tomorrow.

Is the issue fixed for you?

I'm closing this as the upcoming release will change a lot, most likely will fix the issue because it's the whole point of it.

I'm getting a similar error with the new 2.0.0 version:

failed to set retention policy for stage to 7 days due to OperationAbortedException: A conflicting operation is currently in progress against this resource. Please try again.\n    at Request.extractError (/usr/src/app/node_modules/aws-sdk/lib/protocol/json.js:48:27)\n    at Request.callListeners (/usr/src/app/node_modules/aws-sdk/lib/sequential_executor.js:105:20)\n    at Request.emit (/usr/src/app/node_modules/aws-sdk/lib/sequential_executor.js:77:10)\n    at Request.emit (/usr/src/app/node_modules/aws-sdk/lib/request.js:683:14)\n    at Request.transition (/usr/src/app/node_modules/aws-sdk/lib/request.js:22:10)\n    at AcceptorStateMachine.runTo (/usr/src/app/node_modules/aws-sdk/lib/state_machine.js:14:12)\n    at /usr/src/app/node_modules/aws-sdk/lib/state_machine.js:26:10\n    at Request.<anonymous> (/usr/src/app/node_modules/aws-sdk/lib/request.js:38:9)\n    at Request.<anonymous> (/usr/src/app/node_modules/aws-sdk/lib/request.js:685:12)\n    at Request.callListeners (/usr/src/app/node_modules/aws-sdk/lib/sequential_executor.js:115:18)\n    at Request.emit (/usr/src/app/node_modules/aws-sdk/lib/sequential_executor.js:77:10)\n    at Request.emit (/usr/src/app/node_modules/aws-sdk/lib/request.js:683:14)\n    at Request.transition (/usr/src/app/node_modules/aws-sdk/lib/request.js:22:10)\n    at AcceptorStateMachine.runTo (/usr/src/app/node_modules/aws-sdk/lib/state_machine.js:14:12)\n    at /usr/src/app/node_modules/aws-sdk/lib/state_machine.js:26:10\n    at Request.<anonymous> (/usr/src/app/node_modules/aws-sdk/lib/request.js:38:9)\n    at Request.<anonymous> (/usr/src/app/node_modules/aws-sdk/lib/request.js:685:12)\n    at Request.callListeners (/usr/src/app/node_modules/aws-sdk/lib/sequential_executor.js:115:18)\n    at callNextListener (/usr/src/app/node_modules/aws-sdk/lib/sequential_executor.js:95:12)\n    at IncomingMessage.onEnd (/usr/src/app/node_modules/aws-sdk/lib/event_listeners.js:294:13)\n    at emitNone (events.js:111:20)\n    at IncomingMessage.emit (events.js:208:7)\n    at endReadableNT (_stream_readable.js:1056:12)\n    at _combinedTickCallback (internal/process/next_tick.js:138:11)\n    at process._tickDomainCallback (internal/process/next_tick.js:218:9)

The log in CloudWatch looks like this:

00:00:00 { "level": "info", "message": "..." }
00:00:00 { "level": "info", "message": "..." }
00:00:02 { "level": "error", "message": "failed to set retention policy ..." }

So apparently the setup is not yet finished, but logs already get sent?

I thought that was fixed by the work we did 3 years ago. Do you have some code I can use to reproduce the issue? I will tackle this once I'm done with the testing.

The error originates in putRetentionPolicy.

I'm using a single log group for multiple processes (each process has it's own log stream). When deploying, all the processes are set up with winston-cloudwatch.
Then each process calls ensureGroupPresent on the same group at about the same time, and in that function putRetentionPolicy is always called, also at about the same time. Some of those calls are then failing on AWS' end with OperationAbortedException: A conflicting operation is currently in progress against this resource. Please try again..

On this line you log that error with console.error. In my environment, I overwrite console.error with logger.error, thats how the error message also lands in CloudWatch.

It's not a critical error but should probably be handled and not logged to console. One solution would be to simply try again as AWS tells in the error message. On the other hand, the retentionPolicy should probably not be written on each call to ensureGroupPresent. Maybe you can check if the retentionPolicy is already correct in ensureGroupPresent? I believe AWS' describeLogGroups should contain that information. Then you can only set the retentionPolicy if it is another value.

Yes as you said I think the actions are:

  • retry on error
  • don't always write the policy on ensureGroupPresent

I will tackle this after the upgrade lands on npm. Thanks for the details!

If you have a single log group, the solution is to set ensureLogGroup: false when creating the WinstonCloudWatch. This makes the library skip setting the retention policy.

For example:

new WinstonCloudWatch({
    awsRegion: 'us-east-1',
    logGroupName: 'all_logs',
    logStreamName: streamName,
    ensureLogGroup: false,
})

Would be nice if ensureLogGroup was documented.