Guidance for the UROP program

This guidance provides high-level goals and plans for the UROP (Undergraduate Research Opportunities Program) project.

Goals

  • Having fun and following your interest
  • Becoming an expert in an area
  • Strong implementation skills + theoretical background
  • Innovative industrial application ideas (optional)

Plans

Practice -> Implementation -> Innovation/Application

  1. Learn and practice on deep learning basics(e.g., CNN, RNN, Attention, etc).

  2. Choose a baseline that you are interested in and try to implement it.

  3. Think about new ideas to improve the model you are implementing [Optional]
    or
    Find some interesting applications (especially in the code mining domain). [Optional]

Assignment

Zhao Zixuan #16

Gao Tong and Fang Haoyang #10

Vikram Sambamurthy #4

Hou Kaijun #17

Baselines

Here we provide some popular baselines in our domain. They are about QA systems and deep language models or their applications in the code mining (deep coding) area. Each paper has links to the pdf/data/code. Please choose one you would like to implement. During implementing, please read related papers and keep track of the state-of-art techniques/results of the same topic.
If you get any new ideas/thoughts/problems please discuss with Xiaodong Gu.

NOTE: If you find interesting papers other than the list and would like to implement, please propose and discuss with Xiaodong Gu.

  1. [Code Completion] Code completion with statistical language models (PLDI 2014)

  2. [Code Completion] Toward deep learning software repositories (MSR 2015)

    • Paper: pdf
    • Code: TBD
  3. [Dialogue] Dual Encoder LSTM (SigDial 2015)

  4. [Dialogue] Smart Reply: Automated Response Suggestion for Email (KDD_2016)

  5. [Bug Localization] Learning Unified Features from Natural and Programming Languages for Locating Buggy Source Code (IJCAI 2016)

    • Paper: pdf
    • Code: TBD
  6. [Code Clone Detection] Deep Learning Code Fragments for Code Clone Detection (ASE 2016)

    • Paper: pdf
    • Code: TBD
  7. [Code Summarization] Summarizing Source Code using a Neural Attention Model (ACL 2016)

  8. [Dialog] Deep Reinforcement Learning for Dialogue Generation (EMNLP 2016)

    • Paper: arxiv
    • Data:
    • Code:
  9. [Dialogue] A Hierarchical Latent Variable Encoder-Decoder Model for Generating Dialogues (AAAI 2017)

  10. [Dialogue] Latent Variable Dialogue Models and their Diversity (EACL 2017)

  11. [Dialogue] Generating Long and Diverse Responses with Neural Conversation Models (ICLR 2017)

  12. [Dialogue] Diverse Beam Search:Decoding Diverse Solutions from Neural Sequence Models (ICLR 2017)

  13. [Code Completion] Learning Python Code Suggestion with a Sparse Pointer Network (ICLR 2017 Submitted)

  14. [Commit Summarization] A Neural Architecture for Generating Natural Language Descriptions from Source Code Changes (ACL 2017)

  15. [Bug Fix] DeepFix- Fixing Common C Language Errors by Deep Learning (AAAI 2017)

  16. [Overrun Detect] End-to-End Prediction of Buffer Overruns from Raw Source Code via Neural Memory Networks (IJCAI 2017)

  17. [GAN Text] Adversarial Feature Matching for Text Generation (ICML 2017)

  18. [Dialog] Latent Intention Dialogue Models (ICML 2017)

Datasets

QA - Retrieval

No Title Paper Blogs Code
1 Ubuntu Dialogue Corpus arxiv Tutorial:chatbot-retrieval ubottu
corpus collector
chatbot-retrieval
2 OpenSubtitle arxiv Tutorial
opensubtitle website
3 Twitter Corpus pdf corpus collector

QA - Encoder-Decoder

QA - Comprehension

No Title Paper Blogs Code
1 SQuAD (Stanford QA Dataset)