aboutcode-org/deltacode

Simplify Factors reporting

mjherzog opened this issue · 6 comments

The current approach of listing multiple Factors in one column makes it difficult to sort and filter file-level data in the primary DeltaCode reporting tool - i.e. a spreadsheet. It would be better to reset to three separate fields for:

  1. File "status" Added, Modified, Moved, Removed or Unmodified (current definitions)
  2. Copyright - File contains one or more copyright notices OR none
  3. License - File contains one or more license notices/text OR none

With these three as separate fields, a user can more easily choose the relevant combinations of file information. This does not deal with the actual detection of a changed copyright or license notice. Rather the idea is to show whether each file has copyright or license information. This assumes that we deprecate License Category reporting for now see #106

@MaJuRG Will our new status, has_license and has_copyright attributes and related methods entirely replace the existing methods/scoring for license and copyright changes (i.e., no more use of license info added etc. and no related change to a Delta's score)?

Or might we want to retain that information for use in a future issue -- e.g., a CLI option the user can invoke to include this info in the JSON/CSV output -- storing it for now in some new Delta attributes like license_change and copyright_change?

@johnmhoran We can add two additional flags; Feel free to make a ticket for specific that if you wish (no need if you want to include it in you current work)

We will need to think about ways to record the license key (and maybe copyright holders), but for now we can record just the fact we have a license change as a flag.

Excellent. Thanks, @MaJuRG . I'll include it in my current work rather than opening a new ticket. (I'm tackling issues #107, #109 and #110 together under the 107 rubric.)

If I understand your reply, we won't be storing the current strings (license info added and the others), but rather will treat the new attributes license_change and copyright_change as boolean, i.e., no change vs. 1+ changes, much like we're treating the new attributes has_license and has_copyright. Does that accurately capture our approach?

Yes, exactly.

Thanks.

For the file, we could follow the git file status:

  • unmodified
  • modified
  • added
  • deleted
  • renamed
  • copied

And we also could use algorithm the git used to detect the file moved, renamed or copied.