mast-group/OpenVocabCodeNLM
Contains the code for our ICSE 2020 paper: Big Code != Big Vocabulary: Open-Vocabulary Language Models for Source Code and for its earlier pre-print: Maybe Deep Neural Networks are the Best Choice for Modeling Source Code (https://arxiv.org/abs/1903.05734). This is the first open vocabulary language model for code that uses the byte pair encoding algorithm (BPE) to learn a segmentation of code tokens into subword units.
PythonApache-2.0
Stargazers
- bakszero@LinkedIn
- bzz@jetbrains, @apache
- carlos-gemmellspain
- cfwin
- chubbymaggie
- coder-chenzhiHangzhou, China
- famasoonTokyo, Japn
- fanyang01
- fendaq
- fly51flyPRIS
- forest520
- Fraser-GreenleeStealth
- Galaxy-wbhByteDance
- hlibbabii@giganticode
- iandanforthMenlo Park, CA
- KazuhiraDZ
- kostiskosmidisThessaloniki,Greece
- kuangyl0212
- lifa123china
- lusingaBeijing, China
- matejbalogDeepMind
- MengYan1989
- MonsterStormStorm
- mpatsisUniversity of Edinburgh
- ocean1Exception Level 0
- panozzaj@woven-teams
- qikahhPeking University
- renardbebeTsinghua University
- renatahodovanSZTE
- s1530129650Xian Jiaotong University
- shoaibahmedUniversity of Cambridge
- singhranjodhIndia
- stjordanisGreece
- wanyao1992Huazhong University of Science and Technology
- xuanhan863Los Angeles, USA
- yizhidou