MG1933080 Yi-Jiang Yang
Github Repo: https://github.com/SwimilTylers/pa-irse-codesim
This is an project assignment for NJUCS Introduction of Software Engineering Research (2019). Our goal is an implementation of code similarity detector. This a Go project. You can use Go tools for compilation and unit test.
Our major technique is K-grams and Winnowing. We employs external github repo golang-set.
Hence, you can use go get github.com/deckarep/golang-set
, or you can use IDE like Jetbrains Goland for
automatic configuration (see src/feature/winnow.go
, follow Goland's hint and begin downloading).
You can use codesim -h
for usage information.
Usage: codesim [options] code1 code2.
Options can be:
-b uint
Base of Karp-Rabin String Matching. (default 3)
-ft string
Feature Type. Your choice can be "winnow" or "multi-winnow". (default "multi-winnow")
-k int
Kgrams Parameter. (default 5)
-mm string
Choose measurement. Your choice can be "max" or "mean". (default "max")
-pm string
Choose text preprocess mode. Your choice can be "func-raw", "func-no-comment" or "func-squeeze". (default "func-squeeze")
-sm string
Choose similarity. Your choice can be "smc", "overlap" or "jaccard". (default "jaccard")
-v Show progress.
-w int
Winnow size. Default to 4. (default 4)
We use Go 1.12 to build our project. Before compilation, you MUST deal with external github repo golang-set.
You need type go get github.com/deckarep/golang-set
. Otherwise, if you use Jetbrains Goland, please follow these steps:
-
open
src/feature/winnow.go
in Jetbrains Goland -
go to File/Settings, set
GOPATH
(or doubleshift
and searchGOPATH
) -
watch the file head, find imports
-
click on line.5 (
mapset "github.com/deckarep/golang-set"
) andalt
+enter
-
Choose
'Download ...'
, and wait for completion -
If success, you can see
src/github.com
andpkg
. Now, you can compile the project now.
To run:
go run src/main.go code1 code2
src
├── feature
│ ├── feature.go
│ ├── feature_test.go
│ ├── multiwinnow.go
│ └── winnow.go
├── fingerprint
│ └── fingerprint.go
├── main.go
├── measurement
│ ├── measurement.go
│ ├── measurement_test.go
│ └── utils.go
├── parser
│ ├── parser.go
│ └── parser_test.go
├── preprocess
│ ├── getutils.go
│ ├── preprocess.go
│ └── preprocess_test.go
└── syscmd
├── clangdump.go
├── llvm-dump.sh
├── syscmd_test.go
└── utils.go