giganticode

Pinned Repositories

bohr
Big Old Heuristic Repository
Language:Python6 1 312
codeprep
A toolkit for pre-processing large source code corpora
Language:Python45 5 712
codex_vs_hackerrank
Code Synthesis Evaluation of Codex using Python problem statements from HackerRank
Language:Python2 4 00
icse-2020
Meta-repo for our submission "Big Code!= Big Vocabulary: Open-Vocabulary Models for Source Code" (contains links to repos and artifacts, code for building artifact demonstartion docker image, poster)
Language:Java2 3 00
jemma
JEMMA: An Extensible Java dataset for Many ML4Code Applications
Language:Python18 4 27
langmodels
Applying machine learning to large source code corpora
Language:Python8 3 42
lm-powered
Bringing neural language models to your IDE
Language:JavaScript1 3 01
probes
Probing pre-trained source code models
Language:Python15 4 44
run_bug_run
The RunBugRun dataset of executable bugs
Language:Ruby22 6 24
small-datasets-ml-resources
Resources for paper "Making the most of small Software Engineering datasets with modern machine learning"
Language:Python5 5 15

giganticode's Repositories

giganticode/rafael-experiments
Language:Jupyter Notebook
giganticode/StackOBERTflow-comments-small-v1
giganticode/lm-context-experiments
Language:Jupyter Notebook
giganticode/lm-context-analysis
Code for the ACL 2018 paper "Sharp Nearby, Fuzzy Far Away: How Neural Language Models Use Context"
Language:Python1
giganticode/rose6icse
Language:Java
giganticode/OpenVocabCodeNLM
Contains the code for our paper: Maybe Deep Neural Networks are the Best Choice for Modeling Source Code (https://arxiv.org/abs/1903.05734). This is the first open vocabulary language model for code that uses the byte pair encoding algorithm (BPE) to learn a segmentation of code tokens into subword units.
Language:Python