aws-solutions/distributed-load-testing-on-aws

UI failing to parse results for additional regions

Closed this issue · 5 comments

Describe the bug
Started from scratch and deployed the solution to "us-west-2" region.
After running a test in another region created using the "Regional Deployment CloudFormation Template URL", the console UI says the test failed to parse results.

To Reproduce

  1. Deploy the Distributed Load Testing to "us-west-2" using template URL https://s3.amazonaws.com/solutions-reference/distributed-load-testing-on-aws/latest/distributed-load-testing-on-aws.template.
  2. Use the Regional Deployment CloudFormation Template URL to deploy the solution to a different region (ap-southeast-2): https://s3.us-west-2.amazonaws.com/{stack-name}-dlttestrunnerstoragedltscenariosbuc-1m7mtd4m3szgj/regional-template/distributed-load-testing-on-aws-regional.template
  3. Run a test in the second region.

Expected behavior

User should see test results and metrics for the completed test on the console UI.

Please complete the following information about the solution:

  • Version: [e.g. v1.1.0] 3.2.3
  • Region: [e.g. us-east-1] us-west-2, ap-southeast-2, eu-west-1
  • Was the solution modified from the version published on this repository? No
  • If the answer to the previous question was yes, are the changes available on GitHub?
  • Have you checked your service quotas for the services this solution uses? No
  • Were there any errors in the CloudWatch Logs? Yes, see screenshots below.

Screenshots

First ran a test in the region the solution was initially deployed to (us-west-2), and I was able to view the results as expected.
image

When I ran a test in a different region (ap-southeast-2 or eu-west-1), the test fails to show the results.
image

In the CloudWatch log group /aws/lambda/{stack-name}-DLTLambdaFunctionResultsParserFF5CC9-OQEageflKRLm, I can see error logs
image

Additional context

I tried deleting and deploying the entire CloudFormation stack twice but behaviour is the same.
I used to heavily use version 3.2.0 before upgrading the stack.
During test run, I can see the real-time results for all regions under test, so I know traffic is being sent and metrics collected as expected.

I will look into this and get back to you.

+1 i was just battling this same exact thing today. Re-ployed the stack a few times
Screen Shot 2023-10-17 at 4 16 48 PM

in CloudWatch Log groups > /aws/lambda/jmeter-DLTLambdaFunctionResultsParserFF5CC920-boYJfqWD45Oa 2023/10/17/[$LATEST]74cec19ae6b145ea86ed75

{
    "errorType": "ResourceNotFoundException",
    "errorMessage": "The specified metric filter does not exist.",
    "code": "ResourceNotFoundException",
    "message": "The specified metric filter does not exist.",
    "time": "2023-10-17T22:49:47.149Z",
    "requestId": "57101b7a-6684-48a5-8bbf-a4239379f779",
    "statusCode": 400,
    "retryable": false,
    "retryDelay": 80.2606305148347,
    "stack": [
        "ResourceNotFoundException: The specified metric filter does not exist.",
        "    at Request.extractError (/var/runtime/node_modules/aws-sdk/lib/protocol/json.js:61:27)",
        "    at Request.callListeners (/var/runtime/node_modules/aws-sdk/lib/sequential_executor.js:106:20)",
        "    at Request.emit (/var/runtime/node_modules/aws-sdk/lib/sequential_executor.js:78:10)",
        "    at Request.emit (/var/runtime/node_modules/aws-sdk/lib/request.js:686:14)",
        "    at Request.transition (/var/runtime/node_modules/aws-sdk/lib/request.js:22:10)",
        "    at AcceptorStateMachine.runTo (/var/runtime/node_modules/aws-sdk/lib/state_machine.js:14:12)",
        "    at /var/runtime/node_modules/aws-sdk/lib/state_machine.js:26:10",
        "    at Request.<anonymous> (/var/runtime/node_modules/aws-sdk/lib/request.js:38:9)",
        "    at Request.<anonymous> (/var/runtime/node_modules/aws-sdk/lib/request.js:688:12)",
        "    at Request.callListeners (/var/runtime/node_modules/aws-sdk/lib/sequential_executor.js:116:18)"
    ]
}

The ResultsParser must not be getting the --region properly?

@shawnt18 @jasonhaven this will be addressed in our next release. I will update you as soon as the release is out.

Thanks @kamyarz-aws i forgot to reply here with the temp solution i ended up working out. I noticed some of the CW metrics weren't getting created in some regions and hence couldn't be deleted. It seemed like the Cloudwatch sdk promises(https://github.com/aws-solutions/distributed-load-testing-on-aws/blob/main/source/results-parser/lib/parser/index.js#L484 https://github.com/aws-solutions/distributed-load-testing-on-aws/blob/main/source/task-runner/index.js#L185) were not actually awaiting on the promise so i ended up just slowing the functions down with await new Promise(r => setTimeout(r, 3000));

Wrapping the commands in their own promises didn't seem to work either which i thought was weird:

metricPromises.push(
      new Promise((resolve, reject) => {
        return cloudwatchLogs.putMetricFilter(metricFilterParams, (err, data) => {
          if (err) return reject(err);
          return resolve(data);
        });
      })
    );
...
  await Promise.all(metricPromises);

So im curious what the fix was. Thanks!

The release is out the issue should be reslved
@jasonhaven as for the fix you narrowed it down actually to the right place. but the fix is simpler. There is function called createDashboard that was being called syncronusly eventhough it was defined asyncronosly.