Understanding the AI-powered Binary Code Similarity Analysis

In this paper, we perform a systematic evaluation of the state-of-the-art AI-powered binary code similarity detection (BinSD) approaches on both general binary diffing and two representative downstream applications. According to the findings and implications of our study, we shed light on several key real-world research questions in this problem domain. Specifically, we find that currently, due to the significant binary changes across architectures and optimization levels, the problem of BinSD has not been well addressed. Moreover, the use of some embedding neural networks and evaluation methodologies is questionable and still needs further improvements. Based on the comprehensive experimental results and in-depth analysis, we provide several promising future directions for advancing BinSD. We hope the release of our datasets, benchmarks and implementation can facilitate the development of BinSD.