/logreader

take home project for datadog - perform alerting and statistics logging for typical HTTP service logs'

Primary LanguageGo

To run the program, use the following commands:
  > go mod tidy
  > go run . sample.csv

To run the tests, run the following (also requires `go mod tidy` first):
  > go test -count=1 -v

If you are running a mac with an Apple cpu and don't have go setup, you might
be able to just run `./logreader sample.csv`

I think it is implied by the problem statement, but just to be sure, I
interpreted the 2-minute alert period as a rolling window, rather than a check
every 2 minutes.  I keep a cyclicBuffer with hit counts for each second so that
I can subtract hit counts that are more than 2 minutes in the past and add the
recent ones as I roll the window.

To deal with the out of order log messages in the input, I chose to make the
buffer slightly larger than 2 minutes, so that I can accumulate the hit counts
for the most recent timestamps for a bit befbre analyzing them for possible
alerts.  The reason I chose this route was because I first tried without the
extra buffer and there were a few edge cases that I thought made the code
hard to understand.  Adding the futureBufferSize additional buffer slots
succeeded in making the code simpler, IMO.

I didn't bother with the rolling window (nor the additional buffer) for the
periodic logger, because I figured it wasn't important enough for that use case
to add the complexity.  Some hits will be reported in the next 10s interval due
to the out of order arrivals, but it should average out in the end, so should
still supply a reasonable idea of what is happening.

A note on variable names.  It is considered idiomatic Go to use very short
names for variables in the local scope whenever doing so is reasonably clear.
It was foreign to me when I started using Go, but I decided to follow the
practice here.  The weird spacing around + and - are also idiomatic Go.
¯\_(ツ)_/¯



Future Improvements:

* A decision I made about the periodic logger was not to bother using a heap to
  store the hit counts by section, which would have yielded a better O() run-
  time complexity when reporting the hits by section.  Go does not have
  a simple heap data structure, but I can create one by implementing
  heap.Interface in this library (https://pkg.go.dev/container/heap).  This
  could be a potential improvement, but I'd want to run performance tests
  before deciding if it helps. I suspect it would help if there are enough
  distinct sections in the logs, but might not when there are fewer.

* For my little app to be useful at a larger scale, it would have to be used
  in a distributed system - some form of map reduce architecture. The
  enlarged cyclic buffer could still be used to accumulate hits for the most
  recently reported timestamps before forwarding them to the next stage for
  aggregation.  And I would use some kind of Reporter interface similar to the
  ones I have and implement sending the hit counts to a kafka queue, or another
  communication mechanism to support downstream aggregation nodes in the
  reduction.

* If there is any problem with the timestamps in the input data, the logic
  will fall apart. If a server has a system clock that is ahead of real world
  time, it will be particularly bad.  I think it could be handled reasonably
  well by filtering outliers.  If a timestamp comes in that is too far in the
  future from the current largestTs, maybe discount it, or hold it suspect
  if subsequent logs don't agree with it.  If running in live mode, maybe the
  scrub logic could use the local system clock as a sanity check.

* If we want to eke out as much performance as possible, the regex I used to
  extract the section string could be replaced with some manual, specialized
  string parsing that would perform better.  It would be quite simpel to write.

* I don't know if the csv library I used is doing anything to scrub input
  strings for security threats, but I doubt it.  Depending if there is any
  concern that the data might be tempered with or come from unknown sources,
  then that might be something to look into as well.