aws-solutions/video-on-demand-on-aws-foundation

SNS notification missing InputFile and InputDetails – sometimes (race condition)

cm-dk opened this issue · 1 comments

cm-dk commented

Describe the bug
Sometimes, the SNS notification is missing details on the job input (InputFile and InputDetails) – e.g.:

{
 "Id": "foo",
 "InputDetails": {},
 "Outputs": { ... }
}

To Reproduce
Upload video files repeatedly; due to the nature of the race condition (see below), it should trigger easily with very short video files.

Expected behavior
Notifications always contain input data.

Please complete the following information about the solution:

  • Version: 1.3.0
  • Region: euc1
  • Was the solution modified from the version published on this repository? Yes (cf. #29)
  • If the answer to the previous question was yes, are the changes available on GitHub? No
  • Have you checked your service quotas for the sevices this solution uses? n/a
  • Were there any errors in the CloudWatch Logs? No

Screenshots
n/a

Additional context
The whole logic around jobs-manifest.json (JM) seems to invite different kinds of race condition; most relevant for this issue:

  • event with status INPUT_INFORMATION triggers
    • read JM
    • append data
    • write JM
  • event with status COMPLETE triggers
    • read JM
    • use information from JM for input details

If both events fire in quick succession (more likely for short/fast jobs), the COMPLETE may read the JM before input data was written to it, triggering the "no entry found" if block, which sets InputDetails to an empty object and does not set InputFile at all:

const index = manifest.Jobs.findIndex(job => job.Id === jobDetails.Id);
if (index === -1) {
console.log(`no entry found for jobId: ${jobDetails.Id}, creating new entry`);
jobDetails.InputDetails = {};
manifest.Jobs.push(jobDetails);
results = jobDetails;
} else {

The missing data could be filled in from a mediaconvert:GetJob call (that is happening anyway).

Suspected additional issue
I have not tested this, but from reading the code, it seems highly likely that concurrent processing of jobs leads to missing entries in the JM (data written by one Lambda may be overwritten by a second Lambda that has read JM before the write).

Thanks for you feedback. We have added this request to the backlog for this solution.