Adding Line Filters before parsers results in an erroneous pipe
jamesc-grafana opened this issue · 2 comments
Problem
For certain searches we want to inject a line_filter value into the Loki pipeline before the parser, but without having to add a condition to the rule or pipeline. In these cases the pySigmaBackendLoki may receive a custom pipeline that looks something like:
{
"name": "DataSourceFilters",
"priority": 10,
"transformations": [
{
"id": "__internal_custom_data_source",
"type": "set_custom_attribute",
"attribute": "logsource_loki_selection",
"value": "{job=\"dockerlogs\"}"
},
{
"id": "__internal_custom_parser",
"type": "set_custom_attribute",
"attribute": "loki_parser",
"value": "|=\"Something\" | json"
}
]
}
The query that this results in is something like below (I needed to use a rule, so have used one from AWS):
{job=\"dockerlogs\"} | |=\"Something\" | json | eventSource=~`(?i)lambda\\.amazonaws\\.com` and eventName=~`(?i)UpdateFunctionConfiguration.*`
This should instead look more like:
- {job=\"dockerlogs\"} | |=\"Something\" | json | eventSource=~`(?i)lambda\\.amazonaws\\.com` and eventName=~`(?i)UpdateFunctionConfiguration.*`
+ {job=\"dockerlogs\"} |=\"Something\" | json | eventSource=~`(?i)lambda\\.amazonaws\\.com` and eventName=~`(?i)UpdateFunctionConfiguration.*`
Removing this filter will enable more efficient queries to be used as a base search criteria in Loki
I think this should be "more properly" handled in Sigma with an AddCondition
pipeline to introduce a unbound filter. Would automatically inferring such a pipeline stage from a user-entered query be feasible?
If not, we could absolutely try to infer whether the Loki parser starts with a filter and if so, exclude the prefix |
from the generated query - but it is a slight hack IMO.
I think this should be "more properly" handled in Sigma with an
AddCondition
pipeline to introduce a unbound filter. Would automatically inferring such a pipeline stage from a user-entered query be feasible?If not, we could absolutely try to infer whether the Loki parser starts with a filter and if so, exclude the prefix
|
from the generated query - but it is a slight hack IMO.
Sure, I think that we can infer the pipeline stage from user entered query, however I'm not sure how much work this would be.
We use, at the recommendation of the logs team, logql-lezer to parse user-entered queries into an AST which we're then looking for the stream selector. Presumably, we could pull the sibling nodes from the tree and process those, I wouldn't have thought that the stream selector siblings would then go on to be aggregations, but they could be range expressions