ossf/package-analysis

Define a global entropy measurement for strings and literals

maxfisher-g opened this issue · 1 comments

Entropy calculations currently use per-file character frequency counts to define the expected probabilities for each character. It would be better to measure character frequencies on a large dataset of source files and then use the same frequency counts to analyse all packages.

It will be easier to measure character frequencies when we have static analysis data in bigquery