rabobank-cdc/DeTTECT

Question: are data quality parameters considered for datasource scoring?

rahulgkrishna opened this issue · 4 comments

Hi,

I was wondering if the data quality parameters are considered for data source scoring. Please correct me if I'm doing something wrong. Lemme explain with a scenario in case if that is helpful.

For https://attack.mitre.org/techniques/T1564/010/ my data source layer shows 100% and hence the visibility layer shows a score of 4. The technique is associated with the data source "Process Creation" and is applicable to platform "Windows" only.

image

image

Now my data source yaml has data quality described as below for "Process Creation". This is because I don't have visibility on all windows devices and all types of windows events.
image

If I generate the excel output of the data source yaml, I can see the data quality score for this is 3 out of 5.
So I do see a discrepancy here. While I don't have complete visibility on that data source, 100% or Score of 4 doesn't make sense to me.

Hi @rahulgkrishna

The rough visibility overview (where you see the score of 100%) is based on the number of data sources you have per technique. T1564.010 has 1 data source in ATT&CK and you have that data source in your YAML, so you score 1 out of 1 which results in 100%. Concerning the data quality scores: it only checks if all 5 data quality dimensions have a score of 1 or higher. Next, when generating a visibility overview where you get a score of 4: this is based on the percentage. Because you have 100% it gets the score of 4. Generated visibility scores are based on this scores-percentages:

0-49%: score 1
50-74%: score 2
75-99%: score 3
100%: score 4

The idea of scoring visibility is that you can manually adjust the scores based on experience, in-depth knowledge of your infrastructure and logs etc.

The reason for not taking the exact data quality scores into account, is because lower scores doesn't always have to mean a worse visibility. For example you can have good visibility with bad consistency, timeliness or retention. Reflecting that into the score is not that easy and confusing. That's why we have choosen to base the rough visibility score only on the number of data sources you have per technique. That's also the reason that we say "rough" visibility overview. Making it more exact can be done via the visibility scores that can be adjusted manually.

Hi @rubinatorz,

Thanks for the details. That makes sense. But if I do automatic update of visibility from datasource, will the manual adjustments get overridden?

hi @rahulgkrishna

No the score will not be overridden while auto updating the visibility scores from data sources. There's a flag in the score logbook of the visibility section in the techniques YAML file with the name "auto_generated". This attribute is True when the score is auto generated and not manually adjusted. When you adjust the score manually via DeTT&CT Editor this flag becomes False. When auto updating visibility scores from data sources it takes this flag into account. The CLI gives you however the possibility the choose what you want to do while auto update visibility scores. You can choose per technique if you want to override the score or not.

Got it. Thanks @rubinatorz.