Pinned Repositories
bohr
Big Old Heuristic Repository
codeprep
A toolkit for pre-processing large source code corpora
codex_vs_hackerrank
Code Synthesis Evaluation of Codex using Python problem statements from HackerRank
icse-2020
Meta-repo for our submission "Big Code!= Big Vocabulary: Open-Vocabulary Models for Source Code" (contains links to repos and artifacts, code for building artifact demonstartion docker image, poster)
jemma
JEMMA: An Extensible Java dataset for Many ML4Code Applications
langmodels
Applying machine learning to large source code corpora
lm-powered
Bringing neural language models to your IDE
probes
Probing pre-trained source code models
run_bug_run
The RunBugRun dataset of executable bugs
small-datasets-ml-resources
Resources for paper "Making the most of small Software Engineering datasets with modern machine learning"
giganticode's Repositories
giganticode/rafael-experiments
giganticode/StackOBERTflow-comments-small-v1
giganticode/lm-context-experiments
giganticode/lm-context-analysis
Code for the ACL 2018 paper "Sharp Nearby, Fuzzy Far Away: How Neural Language Models Use Context"
giganticode/rose6icse
giganticode/OpenVocabCodeNLM
Contains the code for our paper: Maybe Deep Neural Networks are the Best Choice for Modeling Source Code (https://arxiv.org/abs/1903.05734). This is the first open vocabulary language model for code that uses the byte pair encoding algorithm (BPE) to learn a segmentation of code tokens into subword units.