Partially automated test evaluation

Question

Partially automated test evaluation

h-arlt opened this issue 7 months ago · 0 comments

Goal

In order to ease evaluation of load test results, users of XLT should be enabled to run a quick and configurable analysis that is performed in an automatic fashion as part of report generation.

Idea

Auto-Evaluation of a load test result is based on rules which run data tests (rule checks) on the generated load test report XML, using XPath to select and evaluate condititions.

For a quick rating, rules can define a number of achievable points that are summarized at the end to get a final score whose value determines the rating to apply. A rule passes when all of its checks succeed. Rules that do not pass do not contribute any point to the final score, and all their points in case they do pass (all-or-nothing).

To model more complex criteria and to allow re-use of rules (and thus to avoid repetiitve definitions), rules are assigned to groups whereas a rule can be assigned to multiple groups. Rules must be assigned to a least one (enabled) group to become effective. Otherwise, it will be ignored (declared but unused).

Groups are evaluated in the order of definition whereas evaluation of a group is meant to evaluate each of the rules assigned to it, in order of assignment/use. Rules that do not pass won't stop evaluation of the remaining rules.
The number of achievable points as well as the computed number of achieved points depend on the group's mode:

Mode	Achievable Points	Achieved Points
firstPassed	Max. of all rules' achievable points	First rule that pass
lastPassed	Max.of all rules' achievable points	Last rule that pass
allPassed	Sum of all rules' achievable points	Sum of all rule that pass

Depending on the overall percentage of points achieved, a final rating is chosen from the list of ratings specified by the user and applied to the test. The list of ratings is processed in the same order as specified and the first applicable rating defines the final rating. A rating is applicable when its value is greater than or equal to the overall percentage of achieved points.

Last but not least, a load test can be marked as failedl based on a rule or group (hard fail by rule/group) or a rating that is configured accordingly. If neither a rule/group nor a rating is configured to mark the load test as failed, it is seen as success/pass regardless of the final score.
Note: A rule in state "ERROR" (due to one of its checks became erroneous, e.g. selector does not match any item or more than one item) won't mark the test as failed even when it is configured to do so.

Error Handling

Unless the JSON is broken or does not validate, we try to run what we can (rules). If any rule breaks (XPath or condition is wrong), we turn the entire outcome to ERROR and don't try to come up with a result. It would be guessing. We still attempt to run all rules, so the user has debugging feedback and can fix the problem at once.

Implementation Details

configuration has to be provided by user
- file in JSON format (see attached schema and example config)
- path to configuration file given as value of property com.xceptance.xlt.scorecard.config in test suite
  → relative to test-suite's config directory
- malformed configuration should be logged as error and treated as such in evaluation's result (also show in report)
presence of configuration file triggers evaluation
- part of load test report generation (last step)
  - failures/errors while attempting to perform evaluation must not abort report generation
result of evaluation
- written to file scorecard.xml in report's target directory as well-formed XML (may be used as input for other tools)
- transformed to HTML and integrated into load test report as separate page (→ Scorecard), if and only if evaluation took place

Configuration File

Evaluation configuration file must be given in JSON format and confirm to the schema linked above.

Version

We have a version flag in the document that has a number (nothing fancy!) and just tells us what version that file is supposed to be. This is meant for saying no to a file with the wrong number for the moment. In the future, we might be able to support versions, but that is future.

{
    "comment": "I can state nice things here to give the file some meaning because JSON does not allow me to do that in any other way", 
    "version": 2
}

Selector

A selector defines a reusable expression to query the test result document.

{
    "selectors" : [
        {
            "id": "agentsWithCPULarger50p", // the ID of the selector - required
            "expression": "count(//agents//totalCpuUsage/mean[number() > 50])", // the XPath expression of the selector - required
            "comment" : "I can write a note to myself here"  // some comment; won't be evaluated - optional
        },
        {
            "id": "homepageRuntimeP95",
            "expression": "//requests/request[name[text() = '^Homepage.*']]/percentiles/p95",
            "comment" : "I can write a note to myself here" 
        },
        {
            "id": "homepageRuntimeP99",
            "expression": "//requests/request[name[text() = '^Homepage.*']]/percentiles/p99",
            "comment" : "I can write a note to myself here" 
        }
    ]
}

Rules

Rules is an optional array of objects as follows:

{
    "id": "homepageResponseTimeC", // the ID for reference by the group and listing in the report - required and unique
    "name": "Homepage C", // the (more or less descriptive) name of the rule - optional
    "comment": "Lorem ipsum ...", // some comment; won't be evaluated - optional
    "enabled": true, // whether the rule should be evaluated at all (helps to add and remove something easily) - optional, default = true
    "description" : "Homepage C rating", // the text to display in the report, kinda help text like
    "failsTest": true, // when true, can make the load test be marked as failed indepenent of a rating fail - optional, default = false
    "negateResult": false, // when 'true' will fail when all checks do pass and succeed when at least check fails - optional, default = false
    "testFailTrigger": "NOTPASSED", // rule evaluation status used as trigger when to mark test as failed (has no effect unless "failsTest" is "true") - optional, default = NOTPASSED
    // the checks to run
    // all rule checks have to pass in order to have the rule evaluate to passed/success - optional
    "checks" : [
        {
            "selector": "//requests/request[name[text() = 'Homepage']]/percentiles/p95",
            "condition": "<= 1000",
            "displayValue" : true, // whether the item matching the selector should be displayed in the report - optional, default = true
            "enabled": true // helps to disable a check easily, if desired - optional, default = true
        },
        {
            "selectorId": "homepageRuntimeP99",
            "condition": "<= 3000",
            "displayValue" : true,
        },
    ],
    // message to display in report when rule does pass/fail - optional, no default value, allowed properties: "success" and "fail"
    "messages": { 
        "success" : "C", // use this message in the report when passed - optional, no default value
        "fail" : "", // otherwise print this text - optional, no default value
    },
    "points" : 5 // the number of points to achieve - optional, default = 0, no upper limit
}

Rule checks can only have a selector or an selectorId, not both. This will be an ERROR. Rules that do not contain any enabled check will always pass, unless negateResult is true which causes the rule to never pass.

Groups

Groups is a non-empty array of objects as follows:

 {
    "id": "group1", // the ID of the group for reference and listing in the report - required and unique
    "name": "Homepage", // the (more or less descriptive) name of the group - optional
    "comment": "Lorem ipsum ...", // some comment; won't be evaluated - optional
    "enabled": true, // helps to disable the entire group - optional, default = true
    "failsTest": false, // when true will mark the test as failed when the group failed - optional, default = false
    "mode": "allPassed", // the group's mode; used to define the source of points as well as the outcome of the group state; must be one of "firstPassed", "lastPassed" or "allPassed - optional, default = 'firstPassed'
    "description": "Just any text that explains the purpose", //  the text to display in the report, kinda help text like - optional, default empty
    // the list of rules assigned to this group and to evaluate in that order - required, at least one rule per group
    "rules": ["homepageResponseTimeA", "homepageResponseTimeB", "homepageResponseTimeC"],
}

At least one enabled group having at least one enabled rule assigned must be specified as item in groups.

Ratings

Ratings is an optional and potentially empty array of objects as follows:

{
   "id": "poor", // the ID of the rating for reference and listing in the report (used as fallback when no name is given) - required and unique
    "name": "Poor", // the name of the rating for reference and listing in the report - optional
    "enabled": true, // helps to disable the rating for testing purposes - optional, default = true
    "comment": "Lorem ipsum ...", // some comment; won't be evaluated - optional
    "description": "Load test performed poorly", //  the text to display in the report, kinda help text like - optional, default empty    
    "value": 50.0, // the upper limit for the rating  - required, must be greater than or equal to 0.0 and lower than or equal to100.0
    "failsTest" : true // whether to mark the load test as failed when applied - optional, default = false
}