Hey, Im comming again. How about you index repositories?

Question

Hey, Im comming again. How about you index repositories?

liuchintao opened this issue 6 years ago · 11 comments

My project uses Lucene62.

Today I want to test my projects' index feature.

It works well at first, but when I want to reformulate raw bug reports, there comes some problems. :-(

After I ran the following script, I did not get what I want.

java -jar blizzard-runner.jar -repo zookeeper -task reformulateQuery -bugIDFile 4Test/bugID.list -queryFile repo-relate/zookeeper/query

The bug report's content as below

Bug 1055 - check for duplicate ACLs in addACL() and create()
actual result:

[zk: (CONNECTED) 0] create /test2 'test2' digest:test:test:cdrwa,digest:test:test:cdrwa
Created /test2
[zk: (CONNECTED) 1] getAcl /test2
'digest,'test:test
: cdrwa
'digest,'test:test
: cdrwa
[zk: (CONNECTED) 2]

but getAcl should only have a single entry.

The path to raw bug report is: BR_Raw/zookeeper/1055.txt

but I just catch a 1055 in repo-relate/zookeeper/query

Plz help me!

Answer 1 · 2018-12-19T08:07:59.000Z

Can you please try with the extension and relative path of the files?

-bugIDFile ./sample-input/sample-bugs.txt -queryFile ./sample-input/sample-query.txt

Also post the detailed output you are getting.

Answer 2 · 2018-12-19T08:17:59.000Z

While I used absolute path and relative path of the file, but it still does not work.

[java -jar blizzard-runner.jar -repo zookeeper -task reformulateQuery -bugIDFile ./4Test/bugID.list -queryFile ./repo-relate/zookeeper/querya

java -jar blizzard-runner.jar -repo zookeeper -task reformulateQuery -bugIDFile /srv/test/4Test/bugID.list -queryFile /srv/test/repo-relate/zookeeper/queryb

The output of script as below:

Query reformulation may take a few minutes. Please wait...
Done: 1055
Query Reformulation completed successfully :-)
Time elapsed:2 seconds
2461 total milliseconds.
2471 total milliseconds spent to extract query.

But the content of querya and queryb still 1055

Answer 3 · 2018-12-19T08:32:08.000Z

It is my code snippets

https://gitlab.com/snippets/1791284

Answer 4 · 2018-12-19T09:18:58.000Z

Do you mean that the reformulated query is only 1055?
That means the tool is failing to access the raw bug report.
In that case, there will be no reformulated query.
I can imagine another case.
Anyway, I will try to reproduce the issue and let you know.

Answer 5 · 2018-12-19T12:11:54.000Z

Thanks a lot :-)

Yes the reformulated query is only bug ID 1005.

But I ran the tool as root role, and the raw bug report belongs to root role/group.

root     12774  116  2.6 2468112 49560 pts/0   Sl   20:00   0:01 java -jar blizzard-runner.jar -repo zookeeper -task reformulateQuery -bugIDFile ./4Test/bugID.list -queryFile ./repo-relate/zookeeper/queryaa

-rw-r--r-- 1 root root 333 Dec 19 20:02 BR_Raw/zookeeper/1055.txt

Answer 6 · 2018-12-19T13:13:23.000Z

I think there might be a problem with the steps to index the repository.

I index zookeeper's java files directly, but I found that your java files in your github repository were renamed by their sequential number.

The part of log when I indexed as below.

adding: 4jtTest/zookeeper/src/contrib/loggraph/src/java/org/apache/zookeeper/graph/MergedLogSource.java
adding: 4jtTest/zookeeper/src/contrib/loggraph/src/java/org/apache/zookeeper/graph/LogSource.java
adding: 4jtTest/zookeeper/src/contrib/loggraph/src/java/org/apache/zookeeper/graph/LogSkipList.java

However I found that initial classname in your result file was replaced by sequential number.

171138	F:/MyWorks/Thesis Works/Crowdsource_Knowledge_Base/M4CPBugs/experiment/corpus/norm-class/ecf/1209.java	0

Answer 7 · 2019-01-01T23:50:43.000Z

Yup, the corpus and index need to be formatted in that way. Were you able to fix it? You might try with other bug reports as well which contain both regular texts and structured elements.

Answer 8 · 2019-01-02T20:28:58.000Z

Can you please upload your source code folder for zookeeper? I think the bug the report is categorized as BR_NL which needs both the index, code and the mapping file. Alternatively, you can try with other bug reports from BR_ST or BR_PE.

Answer 9 · 2019-01-03T02:31:52.000Z

Thanks, you are right, #1055 bug report is categorized as BR_NL.

I will try another bug reports. :-)

These are my indexing code https://gitlab.com/snippets/1791284 and zk test repositoryhttps://gitlab.com/ljt2016/zktest.

Maybe I understand your means, there may be something wrong with my indexing step.

Do you mean that after indexing zookeeper's source codes and recording their Lucene serial number into mapping file, move all source files into somewhere and rename them as their Lucene serial number?

Or at first, rename each source files with an ordinal number and record mapping relationship into mapping file. And then index all renamed source files.

Answer 10 · 2019-01-27T20:45:27.000Z

The creation of mapping file and Lucene indexing were actually done simultaneously.
I just read the project files (.java) recursively from the home folder of a subject system, and added their serial numbers sequentially.

Answer 11 · 2019-02-18T19:15:26.000Z

I see that I used Lucene-6.20 and the granularity was .java file level. Hope this helps.

…

On Sat, Feb 16, 2019 at 6:41 PM SurfGitHub ***@***.***> wrote: hi, @masud-technope <https://github.com/masud-technope> , may I ask which version of Lucene did you use for BLIZZARD? I tried Lucene-6.2.0 for tomcat70 (the dataset you provided), but the generated index files of Lucene is different from the ones you provided. Thank you so much for sparing your precious time for replying my question! — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#2 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABxFpATWbbIdq2Qb7hkjAxd1c6Z-SEsGks5vOKVHgaJpZM4ZZtAw> .

-- *Mohammad Masudur Rahman* PhD Candidate Department of Computer Science University of Saskatchewan http://www.linkedin.com/in/masudcseku http://www.usask.ca/~masud.rahman/