Converts Netwitness log parser configuration into Logstash configuration
Disclamer: RSA2ELK is being published as an independent project (by Vincent Maury) and is in no way associated with, endorsed, or supported by Elastic. RSA2ELK is hereby released to the public as unsupported, open source software. Vincent Maury or Elastic cannot be held responsible for the use of this script! Use it at your own risk
The purpose of this tool is to convert an existing configuration made for RSA Netwitness Log Parser software (ingestion piece of the RSA SIEM) into a Logstash configuration that can ingest logs to Elasticsearch.
RSA uses one configuration file per device source (product). For example, one file will handle F5 ASM, another one will handle F5 APM, etc.
RSA used to publish configuration files for 300 devices (see below). So if you are not an RSA user, you can still pass any of these configuration files to the rsa2elk tool to generate the corresponding Logstash pipeline.
Status as of March 2020: Elastic or Vincent Maury has no direct or indirect relation to RSA.
Until March 2020, RSA used to publish the configuration files for 300 devices on their github with the Apache 2.0 license. Since March 2020, this repo has been closed, but you can find a copy of these configuration files in the devices directory.
These instructions will get you a copy of the project up and running on your local machine.
This piece of python has no other pre-requisite than Python 3. No need for additional library. It should work on any platform (tested on Windows so far).
Just clone this repository and run the script on the sample zScaler file.
git clone https://github.com/blookot/rsa2elk.git
cd rsa2elk
python rsa2elk.py -h
python rsa2elk.py -p -q -e -f -r -t -z
# In the logstash-zscalernssmsg.conf file, replace the log line in the input by the following string: "data as a start ZSCALERNSS: time=hfld2 Jan 30 15:12:07 2020^^timezone=UTC^^action=action^^reason=result^^hostname=vincent.hostname^^protocol=tcp^^serverip=34.103.179.90^^url=https://www.elastic.co/blog/first-posts.php^^urlcategory=Awesome websites^^urlclass=Info on elastic.co^^dlpdictionaries=fld3^^dlpengine=fld4^^filetype=php^^threatcategory=None^^threatclass=No threat^^pagerisk=fld8^^threatname=N/A^^clientpublicIP=fld9^^ClientIP=2a01:cb04:a99:1700:cc1:94df:81c4:9dcd^^location=france^^refererURL=web_referer^^useragent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36^^department=user_dept^^user=username^^event_id=id^^clienttranstime=fld17^^requestmethod=GET^^requestsize=178^^requestversion=HTTP/1.0^^status=200^^responsesize=1589^^responseversion=fld23^^transactionsize=1812"
# In this logstash-zscalernssmsg.conf file, also configure the absolute path to the msgid2parserid-zscalernssmsg.json file (generated by this rsa2elk) in the dictionary_path translate.
logstash -f logstash-zscalernssmsg.conf
The script has several options:
-h
will display help.-i
or--input-file FILE
to enter the absolute patch to the RSA XML configuration file. Alternative is url. See the note below for custom XML file.-u
or--url URL
to enter the URL to the RSA XML configuration file. if no file or url is provided, this program will run on a sample XML file located in the RSA repo.-o
or--output-file FILE
to enter the absolute path to the Logstash .conf file (default:logstash-[device].conf
).-p
or--parse-url
adds a pre-defined filter block (see filter-url.conf) to parse URLs into domain, query, etc (default: false).-q
or--parse-ua
adds a pre-defined filter block (see filter-ua.conf) to parse User Agents (default: false).-e
or--enrich-geo
adds a filter block (see filter-geoip.conf) to enrich public IPs with geoip information (default: false).-f
or--enrich-asn
adds a filter block (see filter-asn.conf) to enrich public IPs with ASN information (default: false).-x
or--remove-parsed-fields
removes the event.original and message fields if correctly parsed (default: false).-r
or--rename-ecs
renames default RSA fields to ECS fields (default: false).-t
or--trim-fields
trims (strips left and right spaces) from all string fields (default: false).-n
or--no-grok-anchors
removes the begining (^) and end ($) anchors in grok (default: false, ie default is to have them).-a
or--add-stop-anchors
adds hard stop anchors in grok to ignore in-between chars, see explanation below. Should be set as a serie of plain characters, only escaping " and \. Example:\"()[]
(default: "").-m
or--single-space-match
to only match 1 space in the log if there is 1 space in the RSA parser (default: false, ie match 1-N spaces aka[\s]+
).-c
or--check-config
runs on check of the generated configuration withlogstash -t
(default: false).-l
or--logstash-path
to enter the absolute path to logstash bin executable (default is my local path!).-d
or--debug
to enable debug mode, more verbose (default: false).
The XML configuration file can be specified using the -i
option for a local file or -u
option for a URL.
When specifying a local file, for instance networkdevice.xml
, the script will also look for a related "custom" XML file named networkdevice-custom.xml
. If it exists, the script will take each entry (header & message) of the custom XML and insert them in the "main" XML tree. See RSA doc for more documentation.
The tool mostly generates the filter
part of the Logstash configuration. The input
and output
sections are copied from the input.conf and output.conf files that you can customize.
Note: the filter-url.conf file adds a section at the end of the Logstash configuration to deal with urls. The filter-ua.conf parses user agents. Both files can be customized, partially commented... In particular, the user-agent parsing can be resource intensive.
Note: the filter-geoip.conf and filter-asn.conf enrichments are also lookups on large tables which can be resource intensive.
The script generates 4 outputs:
logstash-[device].conf
which is the main Logstash pipeline configurationmsgid2parserid-[device].json
is a translation file to map message id2 to a parser ides-mapping.json
is the Elasticsearch mapping fileoutput-logstash-[device]-configtest.txt
in case the-c
option has been activated to test the configuration
You can grab the logstash-[device].conf
file (or custom name you defined) as well as the msgid2parserid-[device].json
and es-mapping.json
generated by this script.
The msgid2parserid-[device].json
file is used in the pipeline to translate the [rsa][message][id2]
(as set in the header) into a [logstash][msgparser][id]
expected in the message section. You should add the absolute path to this file in the dictionary_path
key (of the pipeline config) when you run Logstash.
If you use the Elasticsearch output of the generated pipeline, you will see the use of a template linking to the es-mapping.json
file. You should add the absolute path to this file in the template
key (of the pipeline config) when you run Logstash.
When the check-config
flag has been activated, this configuration file is automatically tested by Logstash. The output of Logstash can be checked in the output-logstash-[device]-configtest.txt
file that is created in the same directory than the rsa xml file input.
Considering some Logstash configuration files could be large (tens of thousands of lines), you may want to tune Logstash.
Beyond usual tuning, here are a few advices from @colinsurprenant (see this github issue):
Xms
andXmx
are for setting the global heap size (typically in Gb). There used to be a "golden rule" for setting maximum heap not more that 50% of the total physical memory, to leave room for OS buffers etc. This should be roughly followed in general, although you might be able to go up to 75% of the physical memory.-Xss
is for the stack size which is per-thread and is typically in Kb or low Mb (default is 1024Kb). If you get aStackOverflowError
error, you could try at-Xss2m
and doubling up. You should never need more that a 4 to 16mb stack size.
RSA Netwitness Log Parser is the piece of software ingesting data in the Netwitness platform. It comes with a nice UI (see the user guide). Elastic also provides 2 ways to ingest data into Elasticsearch: Logstash - as an ETL - and the Elasticsearch ingest pipelines. This tool focuses on Logstash, as a way to ease ingest (capturing data via syslog, files, etc and writing to elasticsearch or other destinations) but the plan is to port this tool to the Elasticsearch ingest pipeline (leveraging Filebeat as syslog termination).
The syntax of the XML configuration file is specific to RSA and falls into 2 parts mainly:
- headers, describing headers of logs, capturing the first fields that are common to many types of messages. These headers then point (using the
messageid
field) to the appropriate message parser - messages, parsing the whole log line, extracting fields, computing the event time (
EVNTTIME
function), concatenating strings and fields to generate new ones (STRCAT
function), setting additional fields with static or dynamic values, etc
In both, the content
attribute describes how the log is parsed. The syntax supports alternatives {a|b}
, field extraction <fld1>
and static strings.
The transform.py
module does the core of the conversion by reading this content line character after character and computing the corresponding grok or dissect pattern.
Dissect is prefered by default, as it performs faster and easily matches the RSA syntax. However, Dissect does not support alternatives {a|b}
and (specifically for headers) it does not support sub group capturing with the payload
field. So, for both cases, we fallback to grok.
When dissect is possible, transformation is easy: "just" replace with %{fld}! As simple as that. And performance should improve (see a feature & perf comparison).
The whole idea of the grok pattern is to capture fields with any character but the one after the field. For example, <fld1> <fld2>
in RSA will result in ?<fld1>[^\s]*)[\s]+(?<fld2>.*)
in grok. Note that the [\s]+
in the middle is quite permissive because many products use several spaces to tabularize their logs. The -s
flag can be used to change this behavior to strictly match the log according to the exact number of spaces in the RSA configuration. This flag will replace the [\s]+
by a simple \s
.
RSA can also handle missing fields when reading specific characters. For example, this RSA parser <fld1> "<fld99>"
will match both aaa "zzz"
(where fld1='aaa') and aaa bbb "zzz"
(where fld1='aaa bbb').
The -a
flag will let the user input specific characters that will serve as anchors, so that when they are found, the grok will jump over the unexpected fields. Using the above example, the grok will look like (?<fld1>[^\s]*)[\s]+(?<anchorfld>[^\"]*)\"(?<fld99>[^\"]+)\"
. Please note that we are adding a anchorfld
field to capture the possible characters before the anchor, so for aaa bbb "zzz"
, the anchorfld
field will only have 'bbb'). Which is what you would expect I think ;-)
RSA uses specific field names in the configuration files that map to meta keys, as described here. Elastic also defined a set of meta fields called ECS, see documentation.
The table-map.csv file is used to map RSA meta fields to ECS naming (as well as field types).
You can customize this file and change key mappings as you like, as long as you keep this csv format (with , or ; separator) and the correct column titles.
The support of table-map.xml
and table-map-custom.xml
is planned (see TODO below).
The latest added features are:
- when several messages share the same parser, group them! this is performed only when there is only 1 message (1 id1) per id2
- support the VendorProducts keys in cef.xml (yet to support the ExtensionKey renamings)
- manage geopoint in mapping when doing geoip
- support VALUEMAP, converted as translation filter
- generate a precise Elasticsearch index template mapping based on ECS or default RSA types
- explicitly display the reason of parsing errors in the configuration
A few bugs have recently been fixed as well :
- The
T
that sometimes is used in date formats and should be escaped - many other date format issues (%K %L %Q or %E used and not documented for instance)
- several EVNTTIME strings (as multiple possible matches)
- double field attributions
- double EVENTTIME
- quotes in message id
- payload being an integer in the RSA meta field types
The main changes since v1 are listed here:
- dissect is now used (instead of grok) when the RSA header parser doesn't have a specific field as payload, and when the message parser has no alternatives. Should result in performance increase.
- the script now also reads the
-custom
device XML file - generate the Elasticsearch index mapping (template)
- support ip geoloc & asn enrichment as new options
- mutate strip (whitespace removal) all text fields as a new option
- read XML headers to grab the configuration device name & group
- support PARMVAL & HDR functions to set message id value
- support functions in header parsing (content) string as well
- better handling of encoding (XML being in ISO-8859-1 and logstash output file in UTF-8)
- renaming rsa fields to ECS is now an option (is ECS is not mandatory, don't rename)
- add grok/dissect id to help monitoring, see pipeline viewer doc and logstash diag
There are still a few ideas to improve this rsa2elk:
- enrich events to get the message id1 and event category
- support enrichment files and custom metadata files
- input a custom
table-map.xml
andtable-map-custom.xml
(RSA customers) for custom fields - support additional custom enrichment with external files (RSA customers)
- port this converter to Elasticsearch ingest pipeline (see documentation), specially since Elasticsearch 7.5 added an enrichment processor
- Vincent Maury - Initial commit - blookot
This project is licensed under the Apache 2.0 License - see the LICENSE.md file for details
- First things first, I should thank RSA for sharing such content and helping the community with great resources!
- Many thanks to my Elastic colleagues for their support, in particular @andsel, @jsvd, @yaauie and @colinsurprenant from the Logstash team, as well as @webmat and @melvynator for the ECS mapping
- Thanks also to my dear who let me work at nights and week-ends on this project :-*