tsaikd/gogstash

Cannot be parsed using grok

wang1219 opened this issue · 7 comments

The log cannot be parsed when I use the grok filter, but I can do it in the Grok Debugger, help
cat config.json

{
    "input": [
        {
            "type":"file",
            "path":"/var/log/nginx/access.log"
        }
    ],
    "debugch":true,
    "filter": [
        {
            "type":"grok",
            "source":"message",
            "match":["%{NGINXTEST}"],
            "patterns_path":"grok-patterns"
        }
    ],
    "output": [
        {
            "type": "stdout"
        }
    ]
}

cat grok-patterns

NGINXTEST %{HOST:upstream_addr}:%{HOST:upstream_port} %{IPORHOST:http_host} %{NUMBER:request_time} %{NUMBER:response_time} %{IPORHOST:remote_addr} - %{NGUSER:remote_user} \[%{HTTPDATE:timestamp}\] "%{WORD:method} %{URIPATHPARAM:request} HTTP/%{NUMBER:httpversion}" %{NUMBER:status} (?:%{NUMBER:bytes}|-) (?:"(?:%{URI:referrer}|-)"|%{QS:referrer}) (?:%{QS:agent}) %{QS:xforwardedfor}

Original log

127.0.0.1:12345 test.monitor.com 0.001 0.001 127.0.0.1 - - [28/Apr/2019:18:26:10 +0800] "GET /view/test.jpeg HTTP/1.1" 200 9444 "http://test.monitor.com/view/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.108 Safari/537.36" "-" "-"

Log Format

'$upstream_addr $http_host $request_time $upstream_response_time $remote_addr - $remote_user [$time_local] "$request" '
                      '$status $body_bytes_sent "$http_referer" '
                      '"$http_user_agent" "$http_x_forwarded_for" "$request_body"';

And Grok Gebugger
image

wenma commented

good 👍

You can try this example: #80
And debug the pattern piece by piece in stdout output.

@tengattack can you help to update grok README to avoid the kind of issue?

yequ commented

😊

@wang1219 The problem of your config is using a pattern NGUSER which is not pre-defined:
https://github.com/vjeantet/grok/blob/master/patterns/grok-patterns

- %{NGUSER:remote_user}
+ %{USER:remote_user}

You could change it to USER.

BTW, if you need faster grok parse speed (by using C code binding regexp library: Onigmo), you can compile gogstash from source code.

A Dockerfile example:

FROM golang:alpine

ARG version

RUN apk --update add --no-cache ca-certificates git tzdata build-base

# build onigmo
WORKDIR /src/build/
RUN git clone https://github.com/k-takata/Onigmo.git --depth=1 \
  && cd Onigmo && ./configure && make && make install

WORKDIR /go/src/github.com/tsaikd/gogstash
COPY . /go/src/github.com/tsaikd/gogstash
RUN sed -i -e 's/github.com\/vjeantet\/grok/github.com\/tengattack\/grok/' /go/src/github.com/tsaikd/gogstash/filter/grok/filtergrok.go \
  && go get -d -v ./...
RUN go build -ldflags "-X main.Version=$version"

@tsaikd No problem.

@tengattack Very thanks.