FortiUnicorn (fortinet-2-elasticsearch)
Scope
We will cover all the road for squeezing all possible information out of Fortinet logs on Elasticseach:
-
ECS translation
-
Logstash pipelines (including geo enrichment, other manipulations as tenant enrichment, dropping guest networks, observer enrichment, etc.)
-
index templates
-
index patterns
-
dashboards
-
event alerts
-
ML alerts
Products
Our focus is to cover security solutions
-
Fortigate (of course!)
-
Fortisandbox
-
Fortiweb
-
Fortimail.......someday
-
Forticlient (EMS).......is there even a way to get syslog from Forticlients?
-
enSilo (that would be great!)
Inputs
We want a full 360° view for monitoring and analysis:
-
syslog
-
snmp
-
snmptraps
-
ping (via heartbeat)
-
ssh commands output (diag sys top) ...someday
-
streaming telemetry ...someday it will be supported
ECS Translations
Disclaimer
ECS is a work in progress, a baby just starting to breathe, still lacks a lot of fields, specially for networking security. However, ECS is probably the best effort out there for log normalization.
So don't expect to have all fields translated to ECS, just Fortigate has 500+ unique fields and ECS is just reaching 400, do the math!!!
Translations Sheets
This is the start of the journey, we needed to fully understand the dataset we were facing before writing a single line of logstash pipeline. So, we got the Log Reference guides and turn them into spreadsheets so we can process the data. We need to denormalize data, merge fields, verify fields mapping (data type), look for filed that overlap with ECS fields, translate fields to ECS, make mapping and pipelines configs.
All the Fortinet to ECS fields translation will be managed by product on a Google sheet.
Fortigate
FortiOS_6.2.X_Log_Reference - Public
Fortigate logs are an ugly beast, mainly because its lack of (good) documentation. Current log reference lacks of field description, no field examples either, logids gets removed without any notice, etc. Starting from 6.2.1, type "utm" was documented, altough it existed long ago. On top of that, GTP events cause some field mismatch mapping like:
-
checksum:
string | uint32
-
from:
ip | string
-
to:
ip | string
-
version:
string | uint32
As far as we are concern, GTP is only part of Fortigate Carrier, which is a different product (¿?) How can Fortigate manage a field that has 2 different data types in its internal relational database? how does fortianalyzer do it? We have no idea, because we have never seent GTP events on a real scenario. In order to avoid any data type mismatch, GTP events are not going to be considered, and unless you use Fortigate Carrier, you are not going to see them either.
-
Data 6.2.X
is the denormalized data set obtained from the Log Reference Guide of version 6.2.X. You can look at it as the Log Reference on spreadsheet format. -
Data
has all the datasets fromData 6.2.X
sheets. You can look at it as the denormalize version of all datasets of major release 6.2. -
On
Overlap ECS - Summary of fields
, we look for any field on Fortigate dataset that could overlap with ECS fields. First we consolidate all fields with a dynamic table, and then lookup for it over root fields onECS 1.X
. For example, Fortigate dataset has fieldagent
, which overlaps with fieldagent
on ECS. If we find an overlap, we propose a rename for Fortigate field:fortios.agent
. We are doing ECS "enrichment", leaving original fortinet fields as well, just renaming the ones that overlaps with ECS. -
We have decided to attack the full dataset by splitting them by type, resulting in 3 datasets: traffic, utm and event. Each of them has its own translation. So, on sheets
Summary of "traffic type" fields
,Summary of "utm type" fields
andSummary of "event type" fields
we consolidate the fields of each dataset independently. -
On
ECS Translation of "XXX type" fields
is where the magic happens. Based on our criteria, we translate to ECS the fields we consider can fit. Although Fortinet is moving utm type logs to a connection oriented approach, we are only considering client/source server/destination for traffic logs. -
On
logstash - XXX
we consolidate the translated fields of previous sheets and generate logstash code. -
On
fortigate mapping
we filter all fortigate fields that are not string and, based on it type, template mapping code is generated. The template we use consider keyword as default mapping, this is why we only explicitly define non-keyword fields. This sheet might be reviewed because some Fortinet fields are longer than 1024, which is are default lenght. We have not had any issue so far tough.
Translation is where we need more help from the community!!! We manage around 100s of firewalls, but is is very likely we have not covered it all.
Fortisandbox
Current dataset: 3.1.2
FortiSandbox - Log Reference v3.1.2 - Public
Same logic as Fortigate. No type separation has been made tough.
Fortiweb
Current dataset: 6.2.0
FortiWeb_6.2.0_Log_Reference - Public
Logstash
We have tried to make our pipelines as modular as possible, witout any "hardcoding" inside them. For enrichment, we manage "dictionaries", so we can dynamically enrich any data we want, and we can change each dictionary per logstash, which gives the flexibility we are looking for because we have a multitenant deployment, with many logstash deployed all over, and no direct correlation between a logstash and a tenant ( logstash != tenant). So we need to have a very flexible pipeline architecture.
This might not be your case, no problem, just use the pipelines you need!
The overall pipeline flow is as follows:
It is important the sequence of the pipelines, mainly for HA scenarios. We are doing some enrichments via dictionaries that then get overriden with log data. Take for example observer.serial_number
: it gets populated on Observer Enrichment
pipeline, but it gets overriden on FortiXXX 2 ECS
with the translation of devid
field. This is on purpose, because it allows to have just one entry on the dictionary (on HA both devices are exactly the same) but have accuarate data about the specefic properties of the devices on an HA pair (serial_number, name)
Input Syslog
Just receives syslog logs and populated event.module depending on udp port.
Observer Enrichment
Depending on the IP of the firewall (IP sending logs), it looks up on two dictionaries. On the first one, it enriches observer (firewall) properties. On the second one, it enrichs observer (firewall) location. This 2 dictionaries could be merged into one, because the key is the same: "[observer][ip]".
If not found in the dictionary, it means we are receiving logs from an unknown firewall and it tags it as so.
Properties Dictionary
"[observer][ip]" : "[observer][name]","[observer][hostname]","[observer][mac]","[observer][product]","[observer][serial_number]","[observer][type]","[observer][vendor]","[observer][version]","[organization][id]","[organization][name]"
Geo Dictionary
"[observer][ip]" : "[observer][geo][city_name]","[observer][geo][continent_name]","[observer][geo][country_iso_code]","[observer][geo][country_name]","[observer][geo][location][lon]","[observer][geo][location][lat]","[observer][geo][name]","[observer][geo][region_iso_code]","[observer][geo][region_name]","[event][timezone]","[observer][geo][site]","[observer][geo][building]","[observer][geo][floor]","[observer][geo][room]","[observer][geo][rack]","[observer][geo][rack_unit]"
We have added some fields to ECS geo so we can have the exact location: site, building, floor, room, rack, rack_unit.
Maybe, this is not very critical for firewalls, because you usually have just a couple of firewalls per site. However, we added it as a part of our inventory because we also manage switches and APs, and for those you do need the exact location.
KV Syslog
Splits the original log into key-value pairs, and sets the timestamp. Timezone is also dynamically obtained from a dictionary. Our firewalls live in different timezones.
FortiXXX 2 ECS
Based on the spreadsheet:
-
Validates nulls on IP fields (Fortinet loves to fill with "N/A" null fields, which turns into ingestion errors if your field has IP mapping)
-
Renames fortinet fields that overlaps with ECS
-
Translates fortinet field to ECS. We are doing ECS "enrichment", leaving original fortinet fields as well. If you want to replace fields, just change "copy" to "rename".
-
Populates other ECS fields based on ECS recommendations. (related ip, source.address, event.start, etc.)
Geo Enrichment
source.ip/source.nat.ip and destination.ip/destination.nat.ip are inspected to decide whether they are public or private address with the help of .locality fields.
If they are public, then geo enrichment is applied.
Drop
Security is a big data problem, that you have to pay for it. Here, guest networks (or any defined networks) are dropped. There is no need, at least in our case, for ingesting guest networks logs. Guest networks are looked up dynamically from a dictionary.
Output
This is crucial for index strategy:
"ecs-%{[event][module]}-%{[organization][name]}-write"
3 index templates rule it all, each template points to its specific index pattern:
-
ecs-: deals with ECS mapping.
-
%{[event][module]} which could be fortigate, fortisandbox, fortiweb: deals with fortiX mapping.
-
%{[organization][name]}: deals with ILM template, shard allocation specific to the tenant.
Because we have a multitenant scenario, we manage different retention policies per tenant, while ECS mapping is the same for all indexes, and every Fortinet product has its own mapping for original fields.
Dashboards
Fortinet dataset has 500+ fields, so we need many dashboards for exploring it.
We have tried to follow Fortigate´s Logs & Report section. Main objective of these dashboards is to do an exploration of the dataset in order to spot anomalies on it, it is not intended to be a C-level report in any way. We would be using Canvas for C-level visualizations.
There a lot of visualizations on each dashboars so keep in mind performance can be impacted (loading times)
Structure
All dashboards are connected via its header structure. Making it easy to navigate trough them.
Dashboards follow a (max) 3 layer structure, going from more general to more specific.
-
Top level reference Fortinet´s type field: traffic, utm or event. UTM is already disaggregated so it can be easier to go to an specif UTM type, just like in Fortigate´s Logs & Report section.
-
Second level dives into traffic direction (if possible). For example: On traffic´s dashboard, we have
Outbound | Inbound | LAN 2 LAN | VPN | FW Local
. It makes total sense to analyze it separetly.
firewalls have been configured with interface role
following this premise:
- LAN interfaces =
LAN
interface role - WAN interfaces =
WAN
interface role - VPN interfaces =
undefeined
interface role - MPLS interfaces =
LAN
interface role
- Third level refers to which metric are we using for exploring the dataset: We only use sessions and bytes. *we need to filter out logid=20, so we dont get duplicate data when running aggregations. You can filter out this logid on the firewall itself, but we make sure we dont use it.
config log fortianalyzer filter
set filter "logid(00020)"
set filter-type exclude
end
-
sessions: we consider each log as a unique session.
-
bytes: we analyze
source.bytes
anddestination.bytes
by both sum and average.
Authors
Logstash pipelines and Elasticsearch config @hoat23
Dataset analysis and Kibana @enotspe