data preprocessing

Question

data preprocessing

Closed this issue a year ago · 3 comments

Hello author, I’m sorry to bother you again. In the paper, I didn’t see more detailed data preprocessing information. In the process of data preprocessing, is the slice data 256 bytes, 784 bytes, or 900 words? festival. Looking forward to your reply

Answer 1 · 2023-01-04T14:08:03.000Z

Hello author, I’m sorry to bother you again. In the paper, I didn’t see more detailed data preprocessing information. In the process of data preprocessing, is the slice data 256 bytes, 784 bytes, or 900 words? festival. Looking forward to your reply

Hello, the datagram is sliced into lengths in this article and is controlled within 512, here is not referring to bytes, but the size of the slicing sequence.

Answer 2 · 2023-01-05T13:44:24.000Z

Hello author, I’m sorry to bother you again. In the paper, I didn’t see more detailed data preprocessing information. In the process of data preprocessing, is the slice data 256 bytes, 784 bytes, or 900 words? festival. Looking forward to your reply

Hello, the datagram is sliced into lengths in this article and is controlled within 512, here is not referring to bytes, but the size of the slicing sequence.

According to the data set (packet) you gave, I found that there are 64 tokens in each category. How is this related to the 512 tokens you mentioned? Looking forward to your reply, sorry to bother you again

Answer 3 · 2023-01-07T14:04:33.000Z

Hello author, I’m sorry to bother you again. In the paper, I didn’t see more detailed data preprocessing information. In the process of data preprocessing, is the slice data 256 bytes, 784 bytes, or 900 words? festival. Looking forward to your reply

Hello, the datagram is sliced into lengths in this article and is controlled within 512, here is not referring to bytes, but the size of the slicing sequence.

According to the data set (packet) you gave, I found that there are 64 tokens in each category. How is this related to the 512 tokens you mentioned? Looking forward to your reply, sorry to bother you again

The 512 mentioned in the paper refers to the length of the final embedded representation used for pre-training, while in the pre-processing, the packet level and flow level are sufficient by controlling the actual length plus the special token to not exceed 512.