mast-group/api-mining

Questions about your paper

zhuzhujulie opened this issue · 3 comments

Dear author,
Can you tell me , how did you collect projects from github?
Includint the libraries client classes and the libraries example classes.
Also in the java file of "WildcardNamespaceCollector.java", what does the parameter "corpusFolder" represent for?

And i also find when i run the java file of "APICallExtractor" using the Intermediate data you give, the api sequences results seem to be the sequence of MAPO, cause it contains parameters and the brunch structure are not merged to one sequence.

Hi,
In answer to your questions:

  • we used the Github Java corpus (see http://groups.inf.ed.ac.uk/cup/javaGithub/), a collection of Java project source files downloaded from GitHub, and the exact procedure is described in detail in the dataset section on p. 8 of our paper.
  • we used APICallExtractor.java to extract all the API call sequences from the source files we collected above into an .arff file. You then have to run MAPO/UPMiner/PAM on this .arff file to mine the actual API call patterns from all the API call sequences (see the README).
  • the parameter corpusFolder in WildcardNamespaceCollector.java is the folder containing the Github Java corpus described above.

Hope that helps, please let me know if you have any more questions.