Because the file size is too large (47G), and VPN and campus network are charged, we upload partial data(1,000 malicious samples, 5,000 benign samples in a subdataset).

In addition, we use hashing for data masking in data set to protect privacy.

An HTTP session sample consists of multiple request and response packets. Generally, a request and response correspond to each other, and the request packets are separated by "____". Each line is an HTTP field line. A number represents a character.

The data composition in the demo dataset is: HMCT-2020(18) has 1,000 malicious samples and 5,000 benign samples. HMCT-2020(19-20) has 1,000 malicious samples and 5,000 benign samples. Real traffic has 1,000 malicious samples,the list of real malwares, and 5,000 benign samples.

If you want to have complete dataset, please contact us(hmctdata2020@gmail.com).

Personal use of this dataset is permitted. This dataset is for academic research only. Permission to use this dataset for any other purposes must be obtained from the authors by sending a request to hmctdata2020@gmail.com