tudelft-cda-lab/FlexFringe

CSV Reader escaping special characters when parsing Web API Calls

ClintonCao opened this issue · 4 comments

I noticed that the current CSV reader would escape special characters when parsing the row values I want to use as event symbols. Though I that security wise it is good to escape these characters, maybe we can first transform them to a string and then further process it? I think not escaping some of the special characters might be helpful for the use case if we want to learn a behavioural model from web API calls.

I'll provide an example of a model that is learned from HTTP events collected from a Kubernetes cluster.

Say we have the following data:

_source_source_ip _source_destination_ip _source_destination_port _source_query source_host_service destination_host_service
192.168.84.159 192.168.84.160 8761.0 GET /eureka/apps/delta catalog eureka
192.168.84.159 192.168.84.160 8761.0 PUT /eureka/apps/CATALOG/catalog-7764455c7b-gcmrr:catalog:8080 catalog eureka
192.168.84.159 192.168.84.160 8761.0 PUT /eureka/apps/CATALOG/catalog-7764455c7b-gcmrr:catalog:8080 catalog eureka
192.168.84.159 192.168.84.160 8761.0 GET /eureka/apps/delta catalog eureka
192.168.84.159 192.168.84.160 8761.0 PUT /eureka/apps/CATALOG/catalog-7764455c7b-gcmrr:catalog:8080 catalog eureka
192.168.84.159 192.168.84.160 8761.0 GET /eureka/apps/delta catalog eureka
192.168.84.159 192.168.84.160 8761.0 PUT /eureka/apps/CATALOG/catalog-7764455c7b-gcmrr:catalog:8080 catalog eureka
192.168.84.159 192.168.84.160 8761.0 GET /eureka/apps/delta catalog eureka

And say I want to use the following columns to create the event symbol for FlexFringe: _source_destination_port, _source_query, _source_host_service, _destination_host_service. Then I will expect the following model to come out of FlexFringe:

image

But the actual that comes out of FlexFringe is the following:
image

It looks like the CSV reader ignores spaces and "/" when parsing the data

Working on this in the refactor-inputdata branch. Will update once it's ready to go!

Is it csv or abadingo? If I remember correctly, / is also a special character to seperate data from symbols and the remaining part gets just stored in a node...

It could be de abbadingo reader then. IIRC when I had a look with Tom last time, the CSV reader transforms the parsed data into abbadingo format and then passes it onto the abbandingo reader 🤔

Yeah the csv reader internally translates things to abbadingo format and then parses it as abbadingo. If any of the abbadingo delimiter characters are in the input data things can break. One of the goals of the ongoing refactor is to separate the csv parsing from abbadingo parsing completely so that this is no longer an issue