GaloisInc/cclyzerpp

Unify FactGenerator/Datalog relation names

langston-barrett opened this issue · 2 comments

As of #40, the Fact Generator and Datalog code share a list of file/relation names. The Fact Generator refers to relations by group::rel, e.g., variable::name, whereas that corresponds to the Datalog relation variable_name. While there is a clear correspondence between variable::name and variable_name, the relationship doesn't hold in general, e.g. we also have variable::id corresponding to just variable. Thus, predicates.inc has lines like:

PREDICATE(global_var, unmangl_name, global_variable_has_unmangled_name)

where the first two entries describe the C++ (Fact Generator) name, and the third entry describes the filename/Datalog relation name. We should try to derive the latter from the former for the sake of consistency. This will involve changing a ton of Datalog code to use new relation names.

There is some urgency to this task: It's important to do this early in the git history if we're to do it at all. The git history of this project is fairly empty at this point, but someday will encode important choices about how the analysis was constructed.

For the sake of posterity, here are the major areas where they disagree, and how I'm approaching them in #45:

  • Abbreviations, e.g., global_var vs. global_variable: I'm picking something clear but brief and documenting it in the dev docs
  • "infixes" such as _has_ and _in_: These are mostly redundant, so I'll remove them
  • id: The FactGenerator has e.g. variable::id which corresponds to just variable in the Datalog. Not yet sure what to do about this.
  • Extra word separators: The FactGenerator uses extract_element to refer to extractelement, and similarly for other multi-word opcodes. Prefer the opcodes as they appear in LLVM, i.e., as one word.