aboutcode-org/deltacode

Enhance deltacode matching

chinyeungli opened this issue · 0 comments

Given 2 inputs (either csv or json) from scancode scan, the tool should compare the two inputs based on the path value and return pathmatch and pathscore information.

The current behavior is to compare with the "full" path. However, what will be a better way is the compare with segments.
Input A is the path that we want to find matches.
For instance,
Input A:
/tmp/project/a/b/c/d.java
Input B:
/project/test/a/b/c/d.java

deltacode may conclude the above 2 do not match.
Instead, deltacode should return pathscore as 4 (because the above 2 inputs have 4 consecutive segments match starting from the end/right to left) and let user to conclude if this is a real match or not.

In addition, it should also automatically do some filtering in a sense that only keep the highest pathscore as a match and keep both if pathscore are the same for the input.
For instance,
Input C:
/project/c/d.java
Input D:
/tmp/test/a/b/c/d.java

Input C will automatically be ignored because it only has pathscore 2
Input B and D should be kept because these both have pathscore 4