Test cases for SAGE

Question

Test cases for SAGE

Closed this issue a year ago · 0 comments

Currently, SAGE does not have any tests. Below are some ideas for potential test cases.

Regression tests

Keep the "ground truth" version of the attack graphs (the dot files). Before every merge to main, run the new implementation and compare the AGs to the "ground truth". An example using the scripts from my Research Project repository:

In this test file, AGs are compared with the "ground truth" based on the nodes and edges present (diff-ags.sh) and node stats (stats-nodes-ags.sh), and the episode traces passed to FlexFringe are also compared with the "ground truth".

This would be very useful for PRs related to refactoring and changing minor parts of the code that do not affect the resulting AGs.

The disadvantage, however, is that the "ground truth" files will have to be updated if the AGs change. On the other hand, the graphs can be compared to the latest SAGE version on the main branch.

Sinks in FlexFringe vs sinks in AGs

As mentioned in #14, sinks in AGs can be compared with the sinks in the json files generated by FlexFringe. Here no "ground truth" is necessary as FlexFringe output (the S-PDFA) is taken as the "ground truth".

# All found sinks in 2017 are indeed sinks (after all fixes)
[jegor@arch SAGE-fork]$ sinks_after_2017=$(find after-2017AGs/ -type f -name '*.dot' | xargs grep -F -l "dotted" | xargs gvpr 'N [ $.style == "dotted" || $.style == "filled,dotted" || $.style == "dotted,filled" ] { print(gsub(gsub($.name, "\r"), "\n", " | ")); }' | sort -u | uniq -i)
[jegor@arch SAGE-fork]$ all_sinks_2017=$(jq '.nodes[] | select(.issink==1) | .id' before-2017.txt.ff.finalsinks.json | sort)
[jegor@arch SAGE-fork]$ echo -e "$sinks_after_2017" | wc -l
141
[jegor@arch SAGE-fork]$ comm -12 <(echo -e "$sinks_after_2017" | sed 's/^.*ID: \([0-9-]\+\)$/\1/' | sort) <(echo -e "$all_sinks_2017") | wc -l
141

# All found sinks in 2018 are indeed sinks (after all fixes)
[jegor@arch SAGE-fork]$ sinks_after_2018=$(find after-2018AGs/ -type f -name '*.dot' | xargs grep -F -l "dotted" | xargs gvpr 'N [ $.style == "dotted" || $.style == "filled,dotted" || $.style == "dotted,filled" ] { print(gsub(gsub($.name, "\r"), "\n", " | ")); }' | sort -u | uniq -i)
[jegor@arch SAGE-fork]$ all_sinks_2018=$(jq '.nodes[] | select(.issink==1) | .id' before-2018.txt.ff.finalsinks.json | sort)
[jegor@arch SAGE-fork]$ echo -e "$sinks_after_2018" | wc -l
104
[jegor@arch SAGE-fork]$ comm -12 <(echo -e "$sinks_after_2018" | sed 's/^.*ID: \([0-9-]\+\)$/\1/' | sort) <(echo -e "$all_sinks_2018") | wc -l
104

# All non-sinks with IDs in 2017 are indeed non-sinks (after all fixes)
[jegor@arch SAGE-fork]$ non_sinks_with_ids_after_2017=$(find after-2017AGs/ -type f -name '*.dot' | xargs gvpr 'N [ $.style != "dotted" && $.style != "filled,dotted" && $.style != "dotted,filled" ] { print(gsub(gsub($.name, "\r"), "\n", " | ")); }' | sort -u | uniq -i | grep 'ID: ')
[jegor@arch SAGE-fork]$ echo -e "$non_sinks_with_ids_after_2017" | wc -l
28
[jegor@arch SAGE-fork]$ all_non_sinks_after_2017=$(jq '.nodes[] | select(.issink==0) | .id' after-2017.txt.ff.final.json | sort -u)
[jegor@arch SAGE-fork]$ comm -12 <(echo -e "$non_sinks_with_ids_after_2017" | sed 's/^.*ID: \([0-9-]\+\)$/\1/' | sort -u) <(echo -e "$all_non_sinks_after_2017") | wc -l
28

# All non-sinks with IDs in 2018 are indeed non-sinks (after all fixes)
[jegor@arch SAGE-fork]$ non_sinks_with_ids_after_2018=$(find after-2018AGs/ -type f -name '*.dot' | xargs gvpr 'N [ $.style != "dotted" && $.style != "filled,dotted" && $.style != "dotted,filled" ] { print(gsub(gsub($.name, "\r"), "\n", " | ")); }' | sort -u | uniq -i | grep 'ID: ')
[jegor@arch SAGE-fork]$ echo -e "$non_sinks_with_ids_after_2018" | wc -l
16
[jegor@arch SAGE-fork]$ all_non_sinks_after_2018=$(jq '.nodes[] | select(.issink==0) | .id' after-2018.txt.ff.final.json | sort -u)
[jegor@arch SAGE-fork]$ comm -12 <(echo -e "$non_sinks_with_ids_after_2018" | sed 's/^.*ID: \([0-9-]\+\)$/\1/' | sort -u) <(echo -e "$all_non_sinks_after_2018") | wc -l
16

Episode generation

_get_episodes method has test cases for episode creation that are commented out. A better solution would be to move these commented tests into a separate test file

Cutting episode sequences

Cutting episode sequences into episode subsequences could also potentially be tested.

Episode sequences vs state sequences

Depending on whether implementation allows this or not, episode sequences could be compared with state sequences for consistency.