/ml-source-code-analysis

Source code understanding via Machine Learning techniques

Awesome Source Code Analysis Via Machine Learning Techniques

A list of resources for source code analysis application using Machine Learning techniques (eg, Deep Learning, PCA, SVM, Bayesian, proabilistic models, reinformcement learning techniques etc)

Maintainers - Peter Teoh

Contributing

Please feel free to pull requests, email Peter Teoh (htmldeveloper@gmail.com) or join our chats to add links.

[Join the chat at https://gitter.im/tthtlc/awesome-source-analysis]

Sharing

Table of Contents

Machine-Learning-Guided Selectively Unsound Static Analysis http://www.seas.upenn.edu/~kheo/home/paper/icse17-heohyi.pdf

A Survey of Machine Learning for Big Code and Naturalness https://arxiv.org/pdf/1709.06182

Ariadne: Analysis for Machine Learning Programs https://arxiv.org/pdf/1805.04058

The use of machine learning with signal- and NLP processing of source code to fingerprint, detect, and classify vulnerabilities and weaknesses with MARFCAT https://arxiv.org/abs/1010.2511

VulDeePecker: A Deep Learning-Based System for Vulnerability Detection https://arxiv.org/pdf/1801.01681

code2vec: Learning Distributed Representations of Code https://arxiv.org/pdf/1803.09473

Automated software vulnerability detection with machine learning https://arxiv.org/abs/1803.04497

Automatic feature learning for vulnerability prediction https://arxiv.org/pdf/1708.02368

Neural Turing Machines https://arxiv.org/pdf/1410.5401.pdf

DeepCoder: Learning to Write Programs https://arxiv.org/abs/1611.01989

Recent Advances in Neural Program Synthesis https://arxiv.org/pdf/1802.02353

Neural-Guided Deductive Search for Real-Time Program Synthesis https://arxiv.org/pdf/1804.01186

RobustFill: Neural Program Learning under Noisy I/O https://arxiv.org/pdf/1703.07469

On End-to-End Program Generation from User Intention by Deep https://arxiv.org/pdf/1510.07211

Neural Program Search: Solving Programming Tasks from Description https://arxiv.org/pdf/1802.04335

A Syntactic Neural Model for General-Purpose Code Generation https://arxiv.org/pdf/1704.01696

Building Machines That Learn and Think Like People https://arxiv.org/pdf/1604.00289

Differentiable Programs with Neural Libraries https://arxiv.org/pdf/1611.02109

Summary-TerpreT: A Probabilistic Programming Language for Program Induction https://arxiv.org/pdf/1612.00817

Auto-Documenation for Software Development https://arxiv.org/pdf/1701.08485

BOOK: Storing Algorithm-Invariant Episodes for Deep Reinforcement Learning https://arxiv.org/pdf/1709.01308

Boda-RTC: Productive Generation of Portable, Efficient Code ... https://arxiv.org/pdf/1606.00094

Making Neural Programming Architectures Generalize via Recursion https://arxiv.org/pdf/1704.06611

Differentiable Functional Program Interpreters https://arxiv.org/pdf/1611.01988

Utilizing Static Analysis and Code Generation to Accelerate https://arxiv.org/pdf/1206.6466

Deep Probabilistic Programming Languages: A Qualitative Study https://arxiv.org/pdf/1804.06458

BinPro: A Tool for Binary Source Code Provenance https://arxiv.org/pdf/1711.00830

A Survey on Compiler Autotuning using Machine Learning https://arxiv.org/pdf/1801.04405

Estimating defectiveness of source code: A predictive model using GitHub content https://arxiv.org/pdf/1803.07764

EMBER: An Open Dataset for Training Static PE Malware Machine https://arxiv.org/pdf/1804.04637

On End-to-End Program Generation from User Intention by Deep Neural Networks https://arxiv.org/pdf/1510.07211

Utilizing Static Analysis and Code Generation to Accelerate Neural Networks https://arxiv.org/abs/1206.6466

DLPaper2Code: Auto-generation of Code from Deep Learning Research Paper https://arxiv.org/pdf/1711.03543

Inferring Generative Model Structure with Static Analysis https://arxiv.org/pdf/1709.02477

Sorting and Transforming Program Repair Ingredients via Deep Learning Code Similarities https://arxiv.org/pdf/1707.04742

DeepAPT: Nation-State APT Attribution Using End-to-End Deep Neural Networks https://arxiv.org/pdf/1711.09666

Automatic Structure Discovery for Large Source Code https://arxiv.org/pdf/1202.3335

Comment Generation for Source Code: Survey https://arxiv.org/pdf/1802.02971

Towards Reverse-Engineering Black-Box Neural Networks https://arxiv.org/abs/1711.01768

Database Reverse Engineering based on Association Rule Mining https://arxiv.org/pdf/1004.3272.pdf

Automated detection and classification of cryptographic algorithms in binary programs through machine learning https://arxiv.org/pdf/1503.01186

Automatically Generating Commit Messages from Diffs using Neural Machine Translation https://arxiv.org/pdf/1708.09492

When Coding Style Survives Compilation: De-anonymizing Programmers from Executable https://arxiv.org/pdf/1512.08546

Code smells https://arxiv.org/pdf/1802.06063

Data Driven Exploratory Attacks on Black Box Classifiers in Adversarial Domains https://arxiv.org/pdf/1703.07909

pix2code: Generating Code from a Graphical User Interface Screenshot https://arxiv.org/pdf/1705.07962

Deep Learning in Software Engineering https://arxiv.org/pdf/1805.04825

Predicting Software Defects Through SVM: An Empirical Approach https://arxiv.org/pdf/1803.03220

A Survey of Reverse Engineering and Program Comprehension https://arxiv.org/pdf/cs/0503068

https://www.owasp.org/images/7/72/OWASP_Top_10-2017_%28en%29.pdf.pdf

https://arxiv.org/pdf/1709.07101.pdf

https://arxiv.org/pdf/1805.05206.pdf

https://arxiv.org/pdf/1807.09160.pdf

https://arxiv.org/pdf/1806.07336.pdf

Or just search arxiv.org (inaccuracies in identifying papers expected): recent arxiv.org search

LLVM based vulnerabilities search

As an extension

https://ml4code.github.io/

(this site being an offshoot of the paper: https://arxiv.org/abs/1709.06182)