This is an HTTP log parser. It reads log lines in Common Log Format (CLF) and prints some useful statistics (top Hits, bandwidth) on stdout at regular intervals. It also print alerts on the console whenever certain thresholds are crossed.
go build .
docker build -t monitor .
./monitor -logpath /var/log/nginx/access.log -alertThreshold 10
docker run monitor:latest -logpath /tmp/access.log
go test -v ./...
- Use a real timeseries DB. Here I implemented an in-memory "data container" that is probably much slower and less efficient than a full blown TSDB. Beside, in-mem has no data persistance :) But I felt that using a 3rd party DB was cheating.
- Though the code is structured as a pipeline, with each step executing
in a goroutine, some steps could be parallelized (the
readlines
andparselines
for instance), each executed in several goroutines. - The aggregation step is a bottleneck. We could shard the aggregation
with one aggregator per
sectionName
but with need a "router" that routes the HTTP hit to a specific aggregator. LeveragingKafka partitions
and partitioning using the keysectionName
would work. - The current design requires that logs are written to the log file
in chronological order. That is, if line
A
is written before lineB
, thenA
must represent an HTTP hit that happened before the one corresponding toB
. Kafka's semantic can help here too.