activecm/rita

RITA slow/ not working on ~500GB 24 hour dataset

kyleEeeEEeeee opened this issue · 3 comments

Hello,

Still doing some testing of the tool. We've seen great results so far. We are currently trying to run it against the following set of logs (massive network):

24 hours:
conn.log: 438GB
dns.log: 93GB
ssl.log: 23GB
http.log: 5.8GB

It's not working like it has with smaller data sets, but I think it could be a resource issue on our end...but just curious if you all are aware of a limit for what RITA can handle. Is there a certain size where it starts becoming unreliable?

Thanks in advance :)

hello,

I am now realizing that I have the same bug/issue that @Zalgo2462 is working. Is there a specific unique hostname/ip count we need to stay under to have this work? Like what Zalgo changed from 200 to 150? We are trying to figure out if there are customer networks that will simply be too large to run the tool on. Thanks :)

Hello, I believe that would be larger than any dataset I have personally tested RITA with. I imagine that RITA may take 24 hours or longer to process that much data. If the data can be cleanly partitioned (by Zeek sensor, by internal subnets, or by hour), I would recommend splitting the data and running RITA on each partition separately.

In general, the FQDN beaconing analysis will take the longest out of the different analysis modules. I have had to disable this analysis in the RITA config file in the past when working with large datasets.

In the ticket #759, the error causes RITA to skip the beaconFQDN analysis, so this is likely related to but different from the issues you are seeing.

Please post a copy of the output from RITA you are seeing. From there we can see where RITA is getting stuck and if we can't help it along somehow.

Additionally, the files in /var/lib/rita/logs might help us figure out what is going wrong.

Thank you for your time and interest in the project!

Hello,

So sorry about the delay. We realized that the issue seemed to be related to cpu/ram resources. We could get through that giant 570 gig batch if we upped our resources on the test VM. The split method to rolling database method you recommended also worked, but outputting/writing the show-beacons-fqdn to csv (to put into splunk) wouldn't work afterwards. So I'm thinking disabling FQDN on huge data sets may be the way to go, like you had said. When it was failing we would get something like this (the error with the 16mb):

image