icse-2020: A Java repository from giganticode

DOIs of the Artifacts

DOI	Artifact
Java corpus	https://doi.org/10.7488/ds/1690
C corpus	https://doi.org/10.5281/zenodo.3628775
Python corpus	https://doi.org/10.5281/zenodo.3628784
Java, pre-processed	https://doi.org/10.5281/zenodo.3628665
C, pre-processed	https://doi.org/10.5281/zenodo.3628638
Python, pre-processed	https://doi.org/10.5281/zenodo.3628636
Trained models	https://doi.org/10.5281/zenodo.3628628

Code used to run experiments

Codeprep library (for vocabulary study): https://github.com/giganticode/codeprep

Open-vocabulary Neural LM: https://github.com/mast-group/OpenVocabCodeNLM

Paper

If you jse the artifacts, please cite the paper:

@article{karampatsis2020big,
 title={Big Code!= Big Vocabulary: Open-Vocabulary Models for Source Code},
 author={Karampatsis, Rafael-Michael and Babii, Hlib and Robbes, Romain and Sutton, Charles and Janes, Andrea},
 journal={arXiv preprint arXiv:2003.07914},
 year={2020}
}

giganticode/icse-2020

DOIs of the Artifacts

Code used to run experiments

Paper