/Awesome-Binary-Similarity

An awesome & curated list of binary code similarity papers

Awesome Binary Similarity

Title Venue Year Paper Slide Video Github
FASER: Binary Code Similarity Search through the use of Intermediate Representations CAMLIS 2023 link link link
VEXIR2Vec: An Architecture-Neutral Embedding Framework for Binary Similarity 2023 link
kTrans: Knowledge-Aware Transformer for Binary Code Embedding 2023 link link
Improving Binary Code Similarity Transformer Models by Semantics-Driven Instruction Deemphasis ISSTA 2023 link link
Asteria-Pro: Enhancing Deep-Learning Based Binary Code Similarity Detection by Incorporating Domain Knowledge TOSEM 2023 link link
sem2vec: Semantics-aware Assembly Tracelet Embedding TOSEM 2023 link link
1-to-1 or 1-to-n? Investigating the effect of function inlining on binary similarity analysis TOSEM 2023 link
Binary Function Clone Search in the Presence of Code Obfuscation and Optimization over Multi-CPU Architectures AsiaCCS 2023 Link
VulHawk: Cross-architecture Vulnerability Detection with Entropy-based Binary Code Search NDSS 2023 link link
A Game-Based Framework to Compare Program Classifiers and Evaders CGO 2023 link link link link
BBDetector: A Precise and Scalable Third-Party Library Detection in Binary Executables with Fine-Grained Function-Level Features MDPI 2023 link
A Survey of Binary Code Fingerprinting Approaches: Taxonomy, Methodologies, and Features CSUR 2022 link
Practical Binary Code Similarity Detection with BERT-based Transferable Similarity Learning ACSAC 2022 link link link
Improving cross-platform binary analysis using representation learning via graph alignment ISSTA 2022 link link link
jTrans: Jump-Aware Transformer for Binary Code Similarity ISSTA 2022 link link link
COBRA-GCN: Contrastive Learning to Optimize Binary Representation Analysis with Graph Convolutional Networks DIMVA 2022 link
A Large-Scale Empirical Analysis of the Vulnerabilities Introduced by Third-Party Components in IoT Firmware ISSTA 2022 link link link
How Machine Learning Is Solving the Binary Function Similarity Problem Usenix 2022 link link link
Enhancing DNN-Based Binary Code Function Search With Low-Cost Equivalence Checking TSE 2022 link link
Program Representations for Predictive Compilation: State of Affairs in the Early 20's COLA 2022 link link link
Improving binary diffing speed and accuracy using community detection and locality-sensitive hashing: an empirical study JCVHT 2022 link
PalmTree: Learning an Assembly Language Model for Instruction Embedding CCS 2021 link link link
Binary code similarity detection ASE 2021 link
Binary diffing as a network alignment problem via belief propagation ASE 2021 link
Asteria: Deep Learning-based AST-Encoding for Cross-platform Binary Code Similarity Detection IEEE DSN 2021 2021 link link
BinDeep: A deep learning approach to binary code similarity detection ESWA 2021 link
EnBinDiff: Identifying Data-Only Patches for Binaries TDSC 2021 link
BinDiffNN: Learning Distributed Representation of Assembly for Robust Binary Diffing Against Semantic Differences TSE 2021 link link
Codee: A Tensor Embedding Scheme for Binary Code Search TSE 2021 link link
Revisiting Binary Code Similarity Analysis using Interpretable Feature Engineering and Lessons Learned TSE(revision) 2021 link link
How could Neural Networks understand Programs? ICML 2021 2021 link link
Multi-threshold token-based code clone detection SANER 2021 2021 link
FastSpec: Scalable Generation and Detection of Spectre Gadgets Using Neural Embeddings IEEE Euro S&P 2021 2021 link link link
TREX: Learning Execution Semantics from Micro-Traces for Binary Similarity 2020 link link
Similarity of Binaries Across Optimization Levels and Obfuscation ESORICS 2020 2020 link link
Open-source tools and benchmarks for code-clone detection: past, present, and future trends 2020 link
Semantically Find Similar Binary Codes with Mixed Key Instruction Sequence 2020
LibDX: A Cross-Platform and Accurate System to Detect Third-Party Libraries in Binary Code 2020 link
Detecting Code Clones with Graph Neural Network and Flow-Augmented Abstract Syntax Tree SANER 2020 link
What You See is What it Means! Semantic Representation Learning of Code based on Visualization and Transfer Learning 2020 link
Clone Detection on Large Scala Codebases 2020 link
CloneCompass: Visualizations for Code Clone Analysis 2020 link
DEEPBINDIFF: Learning Program-Wide Code Representations for Binary Diffing NDSS 2020 link link link
VGraph: A Robust Vulnerable Code Clone Detection System Using Code Property Triplets EuroS&P 2020 link
Order Matters: Semantic-Aware Neural Networks for Binary Code Similarity Detection AAAI 2020 link
Similarity Metric Method for Binary Basic Blocks of Cross-Instruction Set Architecture NDSS 2020 link link
Investigating Graph Embedding Neural Networks with Unsupervised Features Extraction for Binary Analysis NDSS Workshop on Binary Analysis Research (BAR) 2019 link link
Asm2Vec: Boosting Static Representation Robustness for Binary Clone Search against Code Obfuscation and Compiler Optimization IEEE S&P 2019 link link link
Semantic-Based Representation Binary Clone Detection for Cross-Architectures in the Internet of Things MDPI 2019 link
A Survey of Binary Code Similarity CSUR 2019 link
代码克隆检测研究进展 软件学报 2019 link
A Systematic Review on Code Clone Detection 2019 link
A Cross-Architecture Instruction Embedding Model for Natural Language Processing-Inspired Binary Code Analysis NDSS 2019 link link
Neural Machine Translation Inspired Binary Code Similarity Comparison beyond Function Pairs NDSS 2019 link link link model
SAFE: Self-Attentive Function Embeddings for Binary Similarity 2019 link link link
Learning-Based Recursive Aggregation of Abstract Syntax Trees for Code Clone Detection SANER 2019 link
基于深度学习的跨平台二进制代码关联分析 2019 link
CVSkSA: cross-architecture vulnerability search in firmware based on kNN-SVM and attributed control flow graph 2019 link
Function matching between binary executables: efficient algorithms and features JCVHT 2019 link
BinMatch: A Semantics-based Hybrid Approach on Binary Code Clone Analysis ICSME 2018 link
αDiff: Cross-Version Binary Code Similarity Detection with DNN ASE 2018 link dataset
Binary Similarity Detection Using Machine Learning PLDI 2018 link
CCAligner: A Token Based Large-Gap Clone Detector ICSE 2018 link
Oreo: Detection of Clones in the Twilight Zone FSE 2018 link
VulSeeker: A Semantic Learning Based Vulnerability Seeker for Cross-platform Binary ASE 2018 link link
VulSeeker-pro: enhanced semantic learning based binary vulnerability seeker with emulation 2018 link
FirmUp: Precise Static Detection of Common Vulnerabilities in Firmware 2018 link
BINARM: Scalable and Efficient Detection of Vulnerabilities in Firmware Images of Intelligent Electronic Devices 2018 link
A Resilient and Efficient System for Identifying FOSS Functions in Malware Binaries 2018 link
Beyond Precision and Recall: Understanding Uses (and Misuses) of Similarity Hashes in Binary Analysis 2018 link link
BCD: Decomposing Binary Code Into Components Using Graph-Based Clustering ASIA CCS 2018 link
A Deep Learning Approach to Program Similarity MASES 2018 link
Recurrent Neural Network for Code Clone Detection SEIM 2018 link
The Adverse Effects of Code Duplication in Machine Learning Models of Code 2018 link link
Benchmarks for software clone detection: A ten-year retrospective SANER 2018 link
Binary Code Clone Detection across Architectures and Compiling Configurations ICPC 2017 link
Neural Network-based Graph Embedding for Cross-Platform Binary Code Similarity Detection ACM CCS 2017 link link
BinSequence: Fast, Accurate and Scalable Binary Code Reuse Detection ASIA CCS 2017 link
BinShape: Scalable and Robust Binary Library Function Identification Using Function Shape DIMVA 2017 link
Compiler-agnostic function detection in binaries IEEE EuroS&P 2017 link link
BinSign: Fingerprinting binary functions to support automated analysis of code executables 2017 link
Similarity of binaries through re-optimization PLDI 2017 link link
Transferring code-clone detection and analysis to practice ICSE-SEIP 2017 link
Cryptographic Function Detection in Obfuscated Binaries via Bit-Precise Symbolic Loop Mapping IEEE S&P 2017 link
Supervised Deep Features for Software Functional Clone Detection by Exploiting Lexical and Syntactical Information in Source Code IJCAI 2017 link
Extracting Conditional Formulas for Cross-Platform Bug Search ASIA CCS 2017 link
SPAIN: Security Patch Analysis for Binaries Towards Understanding the Pain and Pills ICSE 2017 link
CCLearner: A Deep Learning-Based Clone Detection Approach 2017 link link
BinSim: Trace-based Semantic Binary Diffing via System Call Sliced Segment Equivalence Checking USENIX 2017 link link link
In-memory Fuzzing for Binary Code Similarity Analysis ASE 2017 link
DéjàVu: a map of code duplicates on GitHub OOPSLA 2017 link
Some from Here, Some from There: Cross-project Code Reuse in GitHub MSR 2017 link
CVSSA: Cross-Architecture Vulnerability Search in Firmware Based on Support Vector Machine and Attributed Control Flow Graph 2017 link
Identifying Functionally Similar Code in Complex Codebases ICPC 2016 link link
Scalable graph-based bug search for firmware images (Genius) ASM CCS 2016 link link link
Cross-Architecture Binary Semantics Understanding via Similar Code Comparison IEEE SANER 2016 link
discovRE: Efficient cross-architecture identification of bugs in binary code NDSS 2016 link
BinGo: Cross-architecture cross-OS Binary Search FSE 2016 link
Kam1n0: Mapreduce-based assembly clone search for reverse engineering KDD 2016 link link
Statistical similarity of binaries PLDI 2016 link link link
Deep learning code fragments for code clone detection ASE 2016 link
A Survey of Software Clone Detection Techniques 2016 link
SourcererCC: Scaling Code Clone Detection to Big Code ICSE 2016 link
Binary executable file similarity calculation using function matching 2016 link
Matching Similar Functions in Different Versions of a Malware 2016 link
BinDNN: Resilient Function Matching Using Deep Learning 2016 link
VulPecker: An Automated Vulnerability Detection System Based on Code Similarity Analysis ACSAC 2016 link link
BigCloneEval: A Clone Detection Tool Evaluation Framework with BigCloneBench 2016 link link
Cross-architecture bug search in binary executables IEEE S&P 2015 link
Library functions identification in binary code by using graph isomorphism testings 2015 link
Evaluating clone detection tools with BigCloneBench 2015 link link
Memoized semantics-based binary diffing with application to malware lineage inference 2015 link
Sigma: A semantic integrated graph matching approach for identifying reused functions in binary code 2015 link link
BYTEWEIGHT: Learning to Recognize Functions in Binary Code USENIX 2014 link link link
Semantics-based obfuscation-resilient binary code similarity comparison with applications to software plagiarism detection FSE 2014 link
Binclone: Detecting code clones in malware SERE 2014 link link
Detecting fine-grained similarity in binaries 2014 link
Leveraging semantic signatures for bug search in binary programs ACSAC 2014 link
How Accurate Is Coarse-grained Clone Detection?: Comparision with Fine-grained Detectors 2014 link
Tracelet-based code search in executables PLDI 2014 link
Control Flow-Based Malware Variant Detection 2014 link
Hashing for Similarity Search: A Survey 2014 link
Achieving accuracy and scalability simultaneously in detecting application clones on android markets ICSE 2014 link
Identifying Shared Software Components to Support Malware Forensics 2014 link
Evaluating Modern Clone Detection Tools 2014 link
Rendezvous: a search engine for binary code MSR 2013 link
Binslayer: accurate comparison of binary executables PPREW 2013 link link
Software clone detection: A systematic review 2013 link
How to extract differences from similar programs? A cohesion metric approach 2013 link
Software clone detection and refactoring 2013 link
An Emerging Approach towards Code Clone Detection: Metric Based Approach on Byte Code 2013 link
A hybrid-token and textual based approach to find similar code segments 2013 link
Gapped code clone detection with lightweight source code analysis 2013 link
MutantX-S: Scalable Malware Clustering Based on Static Features USENIX 2013 link link
Binjuice: Fast Location of Similar Code Fragments Using Semantic Juice PPREW 2013 link
Towards Automatic Software Lineage Inference USENIX 2013 link link
AnDarwin: Scalable Detection of Semantically Similar Android Applications 2013 link
Expose: Discovering potential binary code re-use 2013 link
Function Matching-based Binary level Software Similarity Calculation RACS 2013 link
FIRMA: Malware Clustering and Network Signature Generation with Mixed Network Behaviors RAID 2013 link
A study of repetitiveness of code changes in software evolution ASE 2013 link
ibinhunt: Binary hunting with interprocedural control flow 2012 link link
ReDeBug: Finding Unpatched Code Clones in Entire OS Distributions USENIX 2012 link
Boreas: an accurate and scalable token-based approach to code clone detection ASE 2012 link
Folding Repeated Instructions for Improving Token-Based Code Clone Detection 2012 link
A metrics-based data mining approach for software clone detection 2012 link
Comparison of Clone Detection Techniques 2012
Malware Classification Method via Binary Content Comparison RACS 2012 link
Binary function clustering using semantic hashes ICMLA 2012 link
Value-based program characterization and its application to software plagiarism detection 2011 link
CMCD: Count Matrix Based Code Clone Detection 2011 link
Incremental code clone detection: A pdg-based approach 2011 link
Anywhere, Any-Time Binary Instrumentation 2011 link
Code reuse in open source software development: Quantitative evidence, drivers, and impediments 2010
Index-based code clone detection: incremental, distributed, scalable 2010
Detection of Type-1 and Type-2 Code Clones Using Textual Analysis and Metrics 2010
Ghezzi, A hybrid approach (syntactic and textual) to clone detection 2010
Evaluating code clone genealogies at release level: An empirical study 2010
A survey of Binary similarity and distance measures 2010
Idea: Opcode-Sequence-Based Malware Detection 2010
Behavioral Clustering of HTTP-Based Malware and Signature Generation Using Malicious Network Traces USENIX 2010
Data fingerprinting with similarity digests 2010
Automatic mining of functionally equivalent code fragments via random testing 2009
A mutation/injection-based automatic framework for evaluating code clone detection tools 2009
Problematic code clones identification using multiple detection results 2009
Incremental clone detection 2009
Scalable and incremental clone detection for evolving software 2009
Large-scale Malware Indexing Using Function-call Graphs 2009
Scalable, Behavior-Based Malware Clustering 2009
peHash: A Novel Approach to Fast Malware Clustering USENIX 2009
Detecting Code Clones in Binary Executables 2009
Binhunt: Automatically finding semantic differences in binary programs 2008
Scalable detection of semantic clones 2008
Deckard: Scalable and accurate tree-based detection of code clones 2007
Large-scale code reuse in open source software 2007
A survey on software clone detection research 2007 link
A study of consistent and inconsistent changes to code clones 2007
Comparison and evaluation of clone detection tools 2007
Comprehensive Survey on Distance/Similarity Measures between Probability Density Functions 2007
A Static Birthmark of Binary Executables Based on API Call Structure 2007
CP-Miner: Finding copy-paste and related bugs in large-scale software code 2006
Survey of research on software clones 2006 link
"Cloning considered harmful" considered harmful: patterns of cloning in software 2006 link
GPLAG: detection of software plagiarism by program dependence graph analysis 2006
Detecting Self-mutating Malware Using Control-flow Graph Matching 2006
Identifying Almost Identical Files Using Context Triggered Piecewise Hashing 2006
Hamsa: Fast signature generation for zero-day polymorphic worms with provable attack resilience IEEE S&P 2006
Graph-based comparison of executable objects 2005
SDD: high performance code clone detection system for large scale source code 2005 link
Polygraph: Automatically generating signatures for polymorphic worms 2005
K-gram Based Software Birthmarks 2005
Insights into System-Wide Code Duplication IEEE 2004 link
Clone detection in source code by frequent itemset techniques 2004
Evaluating clone detection techniques from a refactoring perspective 2004
Structural comparison of executable objects 2004
Code compaction of matching single-entry multiple-exit regions 2003 link
CloSpan: Mining: Closed sequential patterns in large datasets 2003
Ccfinder: a multilinguistic token-based code clone detection system for large scale source code 2002
Identifying similar code with program dependence graphs 2001
Using slicing to identify duplication in source code 2001
BMAT – A Binary Matching Tool for Stale Profile Propagation 2000
A language independent approach for detecting duplicated code 1999
Compressing Differences of Executable Code 1999
Similarity search in high dimensions via hashing 1999
Clone detection using abstract syntax trees 1998
Experiment on the Automatic Detection of Function Clones in a Software System Using Metrics 1996
Pattern matching for clone and concept detection 1996
On finding duplication and near-duplication in large software systems 1995 link
Detecting code similarity using patterns 1995
A Cross-platform Binary Diff 1995