nexB/deltacode

RFC: DeltaCode next! and roadmap

pombredanne opened this issue · 2 comments

So here is the outline of the discussion @arnav-mandal1234 and I had to revive and update DeltaCode!

  • We need to update DeltaCode and scancode-fingerprint plugin at https://github.com/nexB/scancode-plugins/tree/main/misc/scancode-fingerprint to the latest standard

    • For deltacode: adopt skeleton, support Python 3.7+, and latest ScanCode-toolkit version) and ensuring we have tests that work on all support OS and Pythons. After this we should have a stable working codebase. We will need to update the licensing to the latest SCTK standards (plain Apache)/ #182
    • Also update https://github.com/nexB/scancode-plugins/tree/main/misc/scancode-fingerprint to latest supported python versions and make tests pass (maybe we should add CI there too?)
    • cleaning up issues, branches and merging @Pratikrocks pending PR #176 :) and leftover GSoC issues: GSOC Project/issues to be done during the GSOC Time frame (at last)
    • Ensuring we have consistent docs #188
  • Then we would like to merge DeltaCode in the core ScanCode-toolkit git repo, preserving the commit history, and update it to become CLI options in ScanCode-toolkit. The commit history will be helpful to preserve changes as well as authorship. Once done, we can selectively move issues to ScanCode-toolkit and archive this repo. #181

  • We will need to add support for comparing packages and focusing the delta capabilities on package scans (rather than mostly files)

  • Finally I would like to see DeltaCode integrated in purldb as a library to support two use cases:

    • Extend package curations: given a package v1 with reviewed license/origin and a new v2 of the same package, are the difference of package metadata, codebase summaries and file level delta such that we can carry forward the review of v1 to v2? or should these be reviewed again?

    • Cluster package to focus curations: given a series of package version v1 to v10, what are the cluster of versions that have essentially similar package metadata, codebase summaries and file level data? and given these clusters which are the key versions to review to validate a whole cluster at once?

@pombredanne @arnav-mandal1234 I made some edits above at #183 (comment), looks great otherwise! 🚀