grok should have a replace functionality like we have for mutate
saurabh8585 opened this issue · 4 comments
Usecase
I have few custom regex patterns which looks for some sensitive information in the log messages like credit card number, social security number etc.
I have applied these patterns inside grok and matching each log message for regex's I wrote in a file inside patterns folder.
Log message which has a matching pattern would be added with a custom field named "Infosec_Pattern" with matching pattern values like "CCN, SSN" etc.
Logstash version 2.3.1
Below is the sample filter config
filter
{
grok {
patterns_dir => ["/logstash/patterns"]
match => { "message" => "%{CCN}" }
add_field => { "Infosec_Pattern" => "CCN" }
}
grok {
patterns_dir => ["/logstash/patterns"]
match => { "message" => "%{SSN}" }
add_field => { "Infosec_Pattern" => "SSN" }
}
}
This works perfect. Now what I want is:
Replace a matched string with some value like "XXXXXXXX" in the message since the matching string contains sensitive information.
In order to do this, I need to make use of mutate where I have to again find the pattern in log message and replace it with desired value using gsub.
Below is the sample filter config (with mutate section)
filter
{
...
... ## Some groks (See above filter config for example)
...
mutate {
remove_field => "tags"
gsub => [
"message","[0-9]{16}","XXXXXXXXXXXXX" #### The regex pattern supposedly matches credit card no which has 16 digit
]
}
}
Output after applying above sample config
Parsed log message without having mutate section looks like below:
{
"message" => "Saurabh ccn is 5123456789012345",
"@version" => "1",
"@timestamp" => "2016-12-07T12:01:09.554Z",
"host" => "d7231b98ec06",
"Infosec_Pattern" => "CCN"
}
Parsed log message having mutate section looks like below:
{
"message" => "Saurabh ccn is XXXXXXXXXXXXX",
"@version" => "1",
"@timestamp" => "2016-12-07T11:57:50.075Z",
"host" => "d7231b98ec06",
"Infosec_Pattern" => "CCN"
}
As we can clearly see, we need to match a pattern twice if I want to replace the matched string in the original message field.
I tried to use overwrite inside grok but that is not helping much as sensitive data can be present anywhere in the string. And also I would not be able to replace the data with some desired value like "XXXX" using overwrite.
Expectation
- Add a functionality in grok itself to replace matched string with some desired value.
OR - Add a functionality in mutate to include the custom regex pattern like we do in grok.
Option 1 seems to be a best fit for this.
Grok is primarily for parsing, not modifying data. The mutate filter (since it does text replacement already), or a new filter, feels like a better place to implement this proposal.
Otherwise, I am in favor of this feature.
Thanks @jordansissel for supporting this issue.
Since this issue interests you, I have 1 more point to make it more interesting.
Currently, we do write 1 custom regex pattern on each line like below.
../my_pattern_directory/my_pattern_file
CCN_MASTER [1-2]{16}
CCN_VISA [2-3]{15}
CCN_AMEX [3-4]{14}
CCN_MAESTRO [4-5]{13}
Inorder to apply above patterns on a log message, we need to write filter like something as shown below
filter
{
grok {
patterns_dir => ["/logstash/patterns"]
match => { "message" => "%{CCN_MASTER}" }
add_field => { "Infosec_Pattern_Found" => "CCN" }
}
grok {
patterns_dir => ["/logstash/patterns"]
match => { "message" => "%{CCN_VISA}" }
add_field => { "Infosec_Pattern_Found" => "CCN" }
}
}
As we can see, the grok count will increases as we have more no of patterns. Also, the "Infosec_Pattern_Found" field getting added redundantly here.
Proposed solution
Instead of identifying custom patterns individually, we can group them like below.
../my_pattern_directory/my_pattern_file
CCN
{
MASTER [1-2]{16}
VISA [2-3]{15}
AMEX [3-4]{14}
MAESTRO [4-5]{13}
}
And the corresponding filter looks something like below.
filter
{
grok {
patterns_dir => ["/logstash/patterns"]
match => { "message" => "%{CCN}" }
add_field => { "Infosec_Pattern_Found" => "CCN" }
}
}
OR
filter
{
grok {
patterns_dir => ["/logstash/patterns"]
match => { "message" => "%{CCN.MASTER}" }
add_field => { "Infosec_Pattern_Found" => "CCN" }
}
}
This way, we will achieve:
- Flexibility of applying multiple similar/different type of regex at a single go
- Getting rid of writing hundreds of grok w.r.t. individual regex patterns
- Config file would be small and hence more readable and clear
- Less error prone
Please do consider this point as well if it seems feasible. Let me know if we can track this altogether in a different ticket.
You can do this today:
CCN_MASTER [1-2]{16}
CCN_VISA [2-3]{15}
CCN_AMEX [3-4]{14}
CCN_MAESTRO [4-5]{13}
# Create a pattern called CCN that matches any of the above:
CCN %{CCN_MASTER}|%{CCN_VISA}|%{CCN_AMEX}|%{CCN_MAESTRO}