Marusyk/grok.net

Base64 content detection

nickproud opened this issue · 3 comments

Hi,

I'd like to add the ability to grok base64 strings from text.
I would add a pattern to detect Base64 to grok-patterns and then have a validator to run over any matches to ensure they truly were base64 encoded using something like below on each match and filtering out the ones that return false:

public static bool IsBase64String(string base64)
{
Span<byte> buffer = new Span<byte>(new byte[base64.Length]);
return Convert.TryFromBase64String(base64, buffer , out int bytesParsed);
}

As per contributing guidelines, I'm raising an issue for discussion and if approved, I'll put a PR together.
Thanks :)

Hello @nickproud,

To detect Base64 strings you can add a custom pattern and use it like:

var custom = new Dictionary<string, string>
{
     {"BASE64", "(?=(.{4})*$)[A-Za-z0-9+/]*={0,2}$"}
};

var grok = new Grok("Basic %{BASE64:credentials}", custom);
GrokResult grokResult = grok.Parse("Basic YWRtaW46cGEkJHdvcmQ=");

Console.WriteLine($"Does my text contain base64 string: {grokResult.Any()}");

foreach (GrokItem item in grokResult)
{
    Console.WriteLine($"{item.Key} : {item.Value}");
}

Output

Does my text contain base64 string: True
credentials : YWRtaW46cGEkJHdvcmQ=

Does it meet your needs?

Grok is designed to work with regular expressions. If you need to extend it, add a custom regex pattern.
We're not going to add more methods to Grok.cs

I was interested in adding it as a native pattern in Grok so you could just pass in 'BASE64' as the pattern name but your solution is great. Thanks. 👍

Oh sure, go ahead. You can add it here

USERNAME [a-zA-Z0-9._-]+
USER %{USERNAME}
EMAILLOCALPART [a-zA-Z][a-zA-Z0-9_.+-=:]+
EMAILADDRESS %{EMAILLOCALPART}@%{HOSTNAME}
INT (?:[+-]?(?:[0-9]+))
BASE10NUM (?<![0-9.+-])(?>[+-]?(?:(?:[0-9]+(?:\.[0-9]+)?)|(?:\.[0-9]+)))
NUMBER (?:%{BASE10NUM})
BASE16NUM (?<![0-9A-Fa-f])(?:[+-]?(?:0x)?(?:[0-9A-Fa-f]+))
BASE16FLOAT \b(?<![0-9A-Fa-f.])(?:[+-]?(?:0x)?(?:(?:[0-9A-Fa-f]+(?:\.[0-9A-Fa-f]*)?)|(?:\.[0-9A-Fa-f]+)))\b