This is the official implementation of Asteria-Pro, a binary code similarity detection tool, as an enhancement version of Asteria, which has been accepted by ACM Transactions on Software Engineering and Methodology (ToSEM).
@article{Yang2023AsteriaProED,
title={Asteria-Pro: Enhancing Deep-Learning Based Binary Code Similarity Detection by Incorporating Domain Knowledge},
author={Shouguo Yang and Chaopeng Dong and Yang Xiao and Yiran Cheng and Zhiqiang Shi and Zhi Li and Limin Sun},
journal={ACM Transactions on Software Engineering and Methodology},
year={2023}
}
It is designed and optimized for efficient and accurate vulnerability detection (a.k.a bug search) task.
The workflow of Asteria-Pro is depicted as follows:
It mainly consists of three parts: DK-based prefiltration, DL-based similairty calculation, and DK-based re-ranking, where DK stands for Domain Knowledge, and DL stands short for Deep Learning. It utilizes DK-based prefiltration to eliminate a siginificant portion of dissimilar functions. DL-based similarity calculation module encodes ASTs into representation vectors using the Tree-LSTM model, and determines similarity score between target function and remainder functions. DK-based re-ranking module reorders candidate homologous functions in the above module using lightweight structural features (i.e., function call relationship).
- IDA Pro 7.5+: Our IDA Python scripts now mainly developed for linux, but can be easily applied to windows
- Make sure the IDA Python switches to python 3.
- Extra python packages are required to install to IDA Python:
pip install cptools tqdm networkx==2.1.0 cxxfilt --target="/path/to/IDA Python/DIR/"
- conda: virtual environment build
- create a new python environment with conda
conda create --name Asteria-pro python=3.8
- activate it
conda activate Asteria-pro
- Install pytorch with cuda(optional): It enables faster encoding.
conda install pytorch==1.12.0 torchvision==0.13.0 torchaudio==0.12.0 cudatoolkit=11.3 -c pytorch
Install pytorch for cpu with
pip install pytorch
if there is no cuda available.
- Install other packages
pip install -r requirements.txt
Before you start, please take a look at settings.py
and replace the IDA_PATH, IDA64_PATH
with your own ida path.
python asteria_pro.py --help
usage: asteria_pro.py [-h] vul_func vul_bin target_bin
Asteria-Pro
positional arguments:
vul_func vulnerable function name
vul_bin binary contains vulnerable function
target_bin path to target binary
optional arguments:
-h, --help show this help message and exit
asteria_pro.py
will finish following jobs:
- Extract call graph and imports exports table of two binaries
- Filter out candidate functions with callee list
- Generate ASTs for vulnerability binary(only for vulnerability function) and target binary(only for candidate functions)
- Encoding ASTs with model used in Asteria
- Rerank the result output by above step and output the final result.
python asteria_pro.py --vul_func ASN1_verify --vul_bin sample_bins/vul_bin/openssl-1.0.1j --target_bin sample_bins/target_bin/libcrypto.so.1.0.0
- Output
load model from saved/models/Asteria/crossarch_train_100000_1659022264.018625.pt
2022-12-12 15:59:01,060 asteria_pro.py-line:85 <Asteria-Pro> [INFO] Generating callee graph and imports exports table for sample_bins/vul_bin/openssl-1.0.1j
cmd:TVHEADLESS=1 /mnt/d/IDAs/IDA7.5/idapro-7.5/idat -Llog/fg_cg_ie_table_ida.log -c -A -S"./fg_cg_ie_table_ida.py sample_bins/vul_bin/cg_ie_table.pkl" sample_bins/vul_bin/openssl-1.0.1j
2022-12-12 16:00:02,705 asteria_pro.py-line:92 <Asteria-Pro> [INFO] Generating callee graph and imports exports table for sample_bins/target_bin/libcrypto.so.1.0.0
cmd:TVHEADLESS=1 /mnt/d/IDAs/IDA7.5/idapro-7.5/idat -Llog/fg_cg_ie_table_ida.log -c -A -S"./fg_cg_ie_table_ida.py sample_bins/target_bin/cg_ie_table.pkl" sample_bins/target_bin/libcrypto.so.1.0.0
2022-12-12 16:00:41,375 asteria_pro.py-line:104 <Asteria-Pro> [INFO] Generating AST for sample_bins/vul_bin/openssl-1.0.1j
2022-12-12 16:01:40,432 asteria_pro.py-line:113 <Asteria-Pro> [INFO] Generating AST for sample_bins/target_bin/libcrypto.so.1.0.0, cand func num:1042
2022-12-12 16:02:44,407 asteria_pro.py-line:120 <Asteria-Pro> [INFO] Encoding for sample_bins/vul_bin/openssl-1.0.1j
encoding ast at <cuda:0>: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:18<00:00, 18.26s/it]
2022-12-12 16:03:02,842 asteria_pro.py-line:127 <Asteria-Pro> [INFO] Encoding for sample_bins/target_bin/libcrypto.so.1.0.0
encoding ast at <cuda:0>: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1030/1030 [02:20<00:00, 7.33it/s]
2022-12-12 16:05:24,651 asteria_pro.py-line:134 <Asteria-Pro> [INFO] Rank result with model similarity
2022-12-12 16:05:25,020 asteria_pro.py-line:137 <Asteria-Pro> [INFO] Reranking ....
----------after filter----------
1042
----------res by asteria model----------
[('sub_4E376D18', 0.9995975), ('sub_4E329E30', 0.9995621), ('CMS_decrypt_set1_key', 0.998976), ('sub_4E326C34', 0.99854267), ('ASN1_verify', 0.99774605), ('sub_4E31A0C8', 0.9972397), ('CMS_SignerInfo_sign', 0.9965063), ('sub_4E35E5CC', 0.99637306), ('sub_4E394024', 0.9931632), ('PKCS7_SIGNER_INFO_sign', 0.99287605), ('sub_4E39FC5C', 0.9920893), ('sub_4E319E20', 0.99161315), ('OBJ_txt2obj', 0.99155724), ('sub_4E31A3B0', 0.98687565), ('TS_RESP_verify_signature', 0.98483604), ('sub_4E32B3A0', 0.9842802), ('SRP_Calc_x', 0.98426867), ('RSA_padding_add_PKCS1_OAEP', 0.98423517), ('sub_4E37F530', 0.9781908), ('sub_4E360B68', 0.9693676), ('sub_4E328E2C', 0.9686148), ('PEM_ASN1_write_bio', 0.96269065), ('RAND_write_file', 0.9557861), ('sub_4E32A9DC', 0.9508539), ('sub_4E329BD8', 0.9488859), ('SRP_Calc_u', 0.9399687), ('OCSP_check_validity', 0.9365947), ('PEM_do_header', 0.93115205), ('OBJ_create', 0.9308302), ('PKCS12_item_i2d_encrypt', 0.9237301), ('PKCS7_SIGNER_INFO_set', 0.9132933), ('PKCS12_pbe_crypt', 0.9120831), ('sub_4E396A78', 0.90902764), ('RAND_load_file', 0.90252256), ('CMS_decrypt_set1_pkey', 0.8936748), ('PEM_SealFinal', 0.8898113), ('EVP_PBE_CipherInit', 0.882748), ('DH_new_method', 0.87721694), ('DSA_new_method', 0.87702715), ('sub_4E36C7FC', 0.87430894), ('PEM_SealInit', 0.8672825), ('RSA_sign', 0.86370206), ('sub_4E38E738', 0.8596324), ('PKCS7_simple_smimecap', 0.85718036), ('sub_4E3181FC', 0.8508231), ('PKCS7_RECIP_INFO_set', 0.8357091), ('EVP_read_pw_string_min', 0.8337895), ('PKCS12_pack_p7encdata', 0.82753694), ('X509_get_pubkey_parameters', 0.82029283), ('RSA_new_method', 0.81993234)]
----------res by reranking----------
[('ASN1_verify', 0.9997746050357819), ('CMS_SignerInfo_sign', 0.7919583196823414), ('PKCS7_SIGNER_INFO_sign', 0.7915952975933369), ('sub_4E39FC5C', 0.5145935425391565), ('SRP_Calc_x', 0.5138114819159875), ('PEM_ASN1_write_bio', 0.5116536805262932), ('SRP_Calc_u', 0.5093814859023461), ('sub_4E396A78', 0.5062873789897332), ('PEM_SealInit', 0.5021128663649926), ('sub_4E394024', 0.44547016895734354), ('PEM_SealFinal', 0.43513497389279887), ('EVP_PBE_CipherInit', 0.4344286469312815), ('sub_4E36C7FC', 0.4335847405286936), ('RSA_sign', 0.43252405203305766), ('sub_4E31A3B0', 0.3756106422497676), ('RSA_padding_add_PKCS1_OAEP', 0.3753465936734126), ('sub_4E37F530', 0.37474215489167434), ('sub_4E32A9DC', 0.37200846534508925), ('PKCS12_item_i2d_encrypt', 0.3692960844590114), ('PKCS12_pbe_crypt', 0.36813138585824234), ('DH_new_method', 0.3646447704388545), ('DSA_new_method', 0.36462579231995806), ('RSA_new_method', 0.3589163110806392), ('sub_4E376D18', 0.3076520566756909), ('sub_4E35E5CC', 0.30732961342884946), ('TS_RESP_verify_signature', 0.30617591188504145), ('sub_4E32B3A0', 0.3061203305537884), ('PEM_do_header', 0.3008075122649853), ('OBJ_create', 0.3007753257568066), ('PKCS12_pack_p7encdata', 0.2904460017497723), ('sub_4E329E30', 0.238417746929022), ('CMS_decrypt_set1_key', 0.23835913768181435), ('sub_4E326C34', 0.23831580510506264), ('sub_4E31A0C8', 0.23818550935158364), ('sub_4E319E20', 0.2376228534258329), ('OBJ_txt2obj', 0.23761726251015297), ('sub_4E360B68', 0.23539830079445473), ('sub_4E328E2C', 0.2353230201281034), ('sub_4E329BD8', 0.2333501302278959), ('OCSP_check_validity', 0.23212101092705362), ('PKCS7_SIGNER_INFO_set', 0.22979086866745585), ('CMS_decrypt_set1_pkey', 0.2278290175474607), ('sub_4E38E738', 0.22442477574715247), ('PKCS7_simple_smimecap', 0.22417957415947548), ('sub_4E3181FC', 0.22354384889969459), ('PKCS7_RECIP_INFO_set', 0.22203244796166055), ('X509_get_pubkey_parameters', 0.22049082150826088), ('RAND_write_file', 0.16480938012783344), ('RAND_load_file', 0.15948302562420186), ('EVP_read_pw_string_min', 0.15260972197239217)]
The massive compiled software binaries can be download from google drive.
In each directory, around 200 software are compiled with different compilation settings.
We have released our vulnerability and patch dataset at zenodo