CSV Reader escaping special characters when parsing Web API Calls
ClintonCao opened this issue · 4 comments
I noticed that the current CSV reader would escape special characters when parsing the row values I want to use as event symbols. Though I that security wise it is good to escape these characters, maybe we can first transform them to a string and then further process it? I think not escaping some of the special characters might be helpful for the use case if we want to learn a behavioural model from web API calls.
I'll provide an example of a model that is learned from HTTP events collected from a Kubernetes cluster.
Say we have the following data:
_source_source_ip | _source_destination_ip | _source_destination_port | _source_query | source_host_service | destination_host_service |
---|---|---|---|---|---|
192.168.84.159 | 192.168.84.160 | 8761.0 | GET /eureka/apps/delta | catalog | eureka |
192.168.84.159 | 192.168.84.160 | 8761.0 | PUT /eureka/apps/CATALOG/catalog-7764455c7b-gcmrr:catalog:8080 | catalog | eureka |
192.168.84.159 | 192.168.84.160 | 8761.0 | PUT /eureka/apps/CATALOG/catalog-7764455c7b-gcmrr:catalog:8080 | catalog | eureka |
192.168.84.159 | 192.168.84.160 | 8761.0 | GET /eureka/apps/delta | catalog | eureka |
192.168.84.159 | 192.168.84.160 | 8761.0 | PUT /eureka/apps/CATALOG/catalog-7764455c7b-gcmrr:catalog:8080 | catalog | eureka |
192.168.84.159 | 192.168.84.160 | 8761.0 | GET /eureka/apps/delta | catalog | eureka |
192.168.84.159 | 192.168.84.160 | 8761.0 | PUT /eureka/apps/CATALOG/catalog-7764455c7b-gcmrr:catalog:8080 | catalog | eureka |
192.168.84.159 | 192.168.84.160 | 8761.0 | GET /eureka/apps/delta | catalog | eureka |
And say I want to use the following columns to create the event symbol for FlexFringe: _source_destination_port, _source_query, _source_host_service, _destination_host_service
. Then I will expect the following model to come out of FlexFringe:
But the actual that comes out of FlexFringe is the following:
It looks like the CSV reader ignores spaces and "/" when parsing the data
Working on this in the refactor-inputdata branch. Will update once it's ready to go!
Is it csv or abadingo? If I remember correctly, / is also a special character to seperate data from symbols and the remaining part gets just stored in a node...
It could be de abbadingo reader then. IIRC when I had a look with Tom last time, the CSV reader transforms the parsed data into abbadingo format and then passes it onto the abbandingo reader 🤔
Yeah the csv reader internally translates things to abbadingo format and then parses it as abbadingo. If any of the abbadingo delimiter characters are in the input data things can break. One of the goals of the ongoing refactor is to separate the csv parsing from abbadingo parsing completely so that this is no longer an issue