About the paper.

Question

About the paper.

Opened this issue 4 years ago · 2 comments

In figure 2 of the paper, there seems to be a input sequence, what are those input sequences? from the looks of it, it looks like the raw bytes of disassembled code.

Answer 1 · 2021-05-11T08:31:54.000Z

Yes, just the bytes belonging to the .text section of the binary.
the entire data is divided into chunks of 2048 bytes and submitted to the network.
during training a random portion of these chunks is replaced by zeroes, for the reason I explained in #1

Answer 2 · 2021-05-13T16:45:22.000Z

Hi, I would like to generate the dataset. How do you think I should proceed?. There is binaryds.py file in there. and similar tests files. I think you have used radare to retreive text section as well. Am I correct?