cisco/joy

Question about parsing TLS data

Applenice opened this issue · 8 comments

Hello!I am having problems parsing TLS data,I don't know where the problem is.
Use version:4.0.0
Operating system:CentOS Linux release 7.5.1804 (Core)、Ubuntu16.04
Configuration file:

output=output/gz
bidir=1
dist=1
classify=1
tls=1
entropy=1
verbosity=1
logfile=output/log/20190129_01.log

I removed the pcap_setfilter() part of the function process_pcap_file() to avoid no VLAN support for data link type.

Use command:

bin/joy -x output/option_config.txt ../DATA/PCAP/

Some data parsing errors were found when viewing the parsing results:

"bytes_out":0,
"packets":[],
"byte_dist":[0,0,0.....],
or
"tls":{"error":"no role"},

However, when I execute a command to process this PCAP file, TLS parsing is normal.
Command at this time:

bin/joy -x output/option_config.txt ../DATA/PCAP/9956.pcap > output/gz/9956.gz

I have reproduced this question many times. The HTTPS data size in the PCAP folder is 6G. I don't know why this problem occurs.Is there any solution?

Looking forward to reply, thank you.

"tls":{"error":"no role"}, -> means that the code could not determine if the flow was a client or server flow. Therefore some of the TLS will not be output because it was not collected properly. Could you send along the config file and pcap file you are using? Also, please update to the latest software as there were some bug fixes around parsing options being turned on recently.

I share the PCAP file and configuration file here:
https://drive.google.com/drive/folders/11V-eaHVeetmsxscCkPay6DJ-hHNFAFNm

When I use the configuration config.txt, the input is the PCAP directory,use command:

bin/joy -x output/config.txt ../DATA/PCAP/

I view the file through the zcat command,there will be many errors, most TLS data parsing results appear: "tls": {"error": "no role"}

When I use the configuration config_single.txt, the input is a pcap file, use command:

bin/joy -x output/config_single.txt ../DATA/PCAP/douban.pcap > output/gz/douban_single.gz

TLS data parsing is normal, no error occurs.

I will upgrade the version to see if this issue still exists.

After upgrading to the latest version, I found that the problem still exists. I seem to have found the reason for this problem.

In the function process_pcap_file():

joy/src/joy.c

Line 1726 in 79d925e

more = pcap_dispatch(handle, NUM_PACKETS_IN_LOOP, libpcap_process_packet, (unsigned char *)&main_ctx);

joy/src/joy.c

Line 111 in 79d925e

#define NUM_PACKETS_IN_LOOP 5

A value of NUM_PACKETS_IN_LOOP for cnt,I changed the value of cnt to -1,use the configuration config.txt, the input is the PCAP directory,TLS data parsing is normal.

Why would you change the value of NUM_PACKETS_IN_LOOP to -1?

I saw the relevant content in the documentation:
http://www.tcpdump.org/manpages/pcap.3pcap.html
https://www.tcpdump.org/manpages/pcap_loop.3pcap.html

A value of -1 or 0 for cnt causes all the packets received in one buffer to be processed when reading a live capture, and causes all the packets in the file to be processed when reading a ``savefile''.

So it is unnecessary to modify that parameter with joy. We set the packet loop to 5, but we continue to loop over packets until you stop the program (live) or the pcap file is exhausted (pcap). The pcap value of 5 allows joy to break out of libpcap and do some data analysis on the packets that have been processed (expired flows, classification, etc). So with joy, you do not need to modify that parameter.

I understand what you mean. But, as I mentioned before, parsing the PCAP directory and parsing a single file can produce different results.

Before I change the NUM_PACKETS_IN_LOOP value, I parse all PCAP files in the directory by parsing a single pcap file, and then parse the PCAP directory. I use the grep command to calculate the number of three keys(error, c_version, s_version) in the json file generated by the two methods . The result is very different, this is where I am confused.I think the results of the two methods should be similar or consistent. After change the NUM_PACKETS_IN_LOOP value, verify this idea by comparing again.

Can you reproduce my situation based on the data and configuration files I provided?

We did find a bug in processing a directory of files versus a single file. A fix for that will go in shortly.