cisco/joy

Joy’s JSON output is invalid - cannot be read in Python or other JSON parsing tools

Closed this issue · 2 comments

Issue Description

Joy tool was used to process .pcap files obtained from publicly available data set[1]. The process went smoothly for majority of the .pcap files. One of the .gz file generated via joy processing threw an unexpected error. After analyzing the crux of this problem, it is noted that this error caused due to a malformed JSON.

Steps to re-produce the issue

Below are the steps to reproduce this issue for further analysis.

  1. Download the malware capture 2013-08-20_capture-win6.pcap file from Stratosphereips malware repositiry dataset[2][3]
  2. Use Joy to process the .pcap file by executing following command in your Joy instance
    ./joy bidir=1 http=1 tls=1 dns=1 ppi=1 output=output.gz 2013-08-20_capture-win6.pcap
  3. Read each line from output.gz file via the function LoadJoy().
def LoadJoy(filename):
    """Loads a .gzip file extracted from PCAP using CISCO's Joy tool
    Arguments:
    filename -- gzip file to open
    """
    with gzip.open(filename) as infile:
        iterFile = iter(infile)
        headerData = json.loads(next(iterFile).decode('utf-8').replace('\\',''))
        for bline in iterFile:
            try:
                line = bline.decode('utf-8')
                dj = json.loads(line.replace('\\',''))
            except:
                print("Error {} and dj is {}".format(line, dj))
    return 1
file = 'output.gz'
LoadJoy(file)

Note: This function raise an exception if there's a JSON parse error.

  1. This function caught multiple JSON parse errors.

  2. After processing one of the result using an online JSON beautifier tool[4], it is noted that all malformed JSONs gives the same error. And the error is :

Parse error on line 1:
...netAuthority.crt."},],"validity_not_befo
-----------------------^
Expecting 'STRING', 'NUMBER', 'NULL', 'TRUE', 'FALSE', '{', '[', got ']'

(There could be be several other types of JSON read errors. This is the error that I encountered after processing 2013-08-20_capture-win6.pcap file)

References

[1]https://www.stratosphereips.org/datasets-overview/
[2]https://www.stratosphereips.org/datasets-malware/
[3]https://mcfp.felk.cvut.cz/publicDatasets/CTU-Malware-Capture-Botnet-6/
[4]https://codebeautify.org/jsonviewer

What version of the code base was used to run this test?

I just ran the following from the latest on master (3.0.0) and did not get any JSON errors:

bin/joy bidir=1 http=1 tls=1 dns=1 ppi=1 output=output.gz 2013-08-20_capture-win6.pcap
./sleuth output.gz

each entry in the output file a complete JSON object. We do this on purpose so that 1 record won't invalidate the entire JSON file. Could you try this on the latest code from master?

assuming resolved with latest code since no response.