/cs454-ngrams

Primary LanguageJupyter Notebook

CS454 Naturalness of Code Hands-on

This repository contains the hands-on materials as well as pointers to the datasets.

Dataset

  • Python: we use the Python dataset used by CodeTrans. Download from this Dropbox folder - the file is in the CodeSearchNet_clean directory: python-train_clean.tsv
  • English: the notebook has the code to download the NLTK dataset.

Paper

Paper titled "On Naturalness of Software" by Hindle et al. is available from here.