Hey, Im comming again. How about you index repositories?
liuchintao opened this issue · 11 comments
My project uses Lucene62.
Today I want to test my projects' index feature.
It works well at first, but when I want to reformulate raw bug reports, there comes some problems. :-(
After I ran the following script, I did not get what I want.
java -jar blizzard-runner.jar -repo zookeeper -task reformulateQuery -bugIDFile 4Test/bugID.list -queryFile repo-relate/zookeeper/query
The bug report's content as below
Bug 1055 - check for duplicate ACLs in addACL() and create()
actual result:
[zk: (CONNECTED) 0] create /test2 'test2' digest:test:test:cdrwa,digest:test:test:cdrwa
Created /test2
[zk: (CONNECTED) 1] getAcl /test2
'digest,'test:test
: cdrwa
'digest,'test:test
: cdrwa
[zk: (CONNECTED) 2]
but getAcl should only have a single entry.
The path to raw bug report is: BR_Raw/zookeeper/1055.txt
but I just catch a 1055
in repo-relate/zookeeper/query
Plz help me!
Can you please try with the extension and relative path of the files?
-bugIDFile ./sample-input/sample-bugs.txt -queryFile ./sample-input/sample-query.txt
Also post the detailed output you are getting.
While I used absolute path and relative path of the file, but it still does not work.
[java -jar blizzard-runner.jar -repo zookeeper -task reformulateQuery -bugIDFile ./4Test/bugID.list -queryFile ./repo-relate/zookeeper/querya
java -jar blizzard-runner.jar -repo zookeeper -task reformulateQuery -bugIDFile /srv/test/4Test/bugID.list -queryFile /srv/test/repo-relate/zookeeper/queryb
The output of script as below:
Query reformulation may take a few minutes. Please wait...
Done: 1055
Query Reformulation completed successfully :-)
Time elapsed:2 seconds
2461 total milliseconds.
2471 total milliseconds spent to extract query.
But the content of querya and queryb still 1055
It is my code snippets
Do you mean that the reformulated query is only 1055?
That means the tool is failing to access the raw bug report.
In that case, there will be no reformulated query.
I can imagine another case.
Anyway, I will try to reproduce the issue and let you know.
Thanks a lot :-)
Yes the reformulated query is only bug ID 1005.
But I ran the tool as root role, and the raw bug report belongs to root role/group.
root 12774 116 2.6 2468112 49560 pts/0 Sl 20:00 0:01 java -jar blizzard-runner.jar -repo zookeeper -task reformulateQuery -bugIDFile ./4Test/bugID.list -queryFile ./repo-relate/zookeeper/queryaa
-rw-r--r-- 1 root root 333 Dec 19 20:02 BR_Raw/zookeeper/1055.txt
I think there might be a problem with the steps to index the repository.
I index zookeeper's java files directly, but I found that your java files in your github repository were renamed by their sequential number.
The part of log when I indexed as below.
adding: 4jtTest/zookeeper/src/contrib/loggraph/src/java/org/apache/zookeeper/graph/MergedLogSource.java
adding: 4jtTest/zookeeper/src/contrib/loggraph/src/java/org/apache/zookeeper/graph/LogSource.java
adding: 4jtTest/zookeeper/src/contrib/loggraph/src/java/org/apache/zookeeper/graph/LogSkipList.java
However I found that initial classname in your result file was replaced by sequential number.
171138 F:/MyWorks/Thesis Works/Crowdsource_Knowledge_Base/M4CPBugs/experiment/corpus/norm-class/ecf/1209.java 0
Yup, the corpus and index need to be formatted in that way. Were you able to fix it? You might try with other bug reports as well which contain both regular texts and structured elements.
Can you please upload your source code folder for zookeeper? I think the bug the report is categorized as BR_NL which needs both the index, code and the mapping file. Alternatively, you can try with other bug reports from BR_ST or BR_PE.
Thanks, you are right, #1055 bug report is categorized as BR_NL.
I will try another bug reports. :-)
These are my indexing code https://gitlab.com/snippets/1791284 and zk test repositoryhttps://gitlab.com/ljt2016/zktest.
Maybe I understand your means, there may be something wrong with my indexing step.
Do you mean that after indexing zookeeper's source codes and recording their Lucene serial number into mapping file, move all source files into somewhere and rename them as their Lucene serial number?
Or at first, rename each source files with an ordinal number and record mapping relationship into mapping file. And then index all renamed source files.
The creation of mapping file and Lucene indexing were actually done simultaneously.
I just read the project files (.java) recursively from the home folder of a subject system, and added their serial numbers sequentially.