fstab/grok_exporter

Reading multiple files ?

filippog opened this issue · 34 comments

Hi,
thanks for grok_exporter, looks really useful! I was wondering if there's support for reading multiple files within the same config/server ?

fstab commented

Currently grok_exporter only supports a single file, so if you want to monitor multiple files you need to start multiple grok_exporter instances. However, I plan to implement support for multiple log files within the next few weeks, as soon as I find the time. I'm leaving this issue open until it's done.

Currently grok_exporter only supports a single file...

Which makes most sense in sidecar use-case 👍

fstab commented

In case anyone wonders why this is not supported yet: Implementing support for multiple log files might require only a few code changes, but there could be unexpected side effects and corner cases, so I would like to implement an extensive automated test first:

  • What if events on different files occur at the same time? Are there any races or timing issues?
  • What if a file being monitored is renamed, overwriting another file that is also being monitored?
  • What are the implications if multiple files are on different filesystems vs on the same filesystem?
  • ... there are many more things that should be tested

My plan is to write automated tests first, and then implement support for monitoring multiple files. I will only release it if I am sure it works even with strange corner cases. It might take another few weeks until I find the time.

@fstab thanks, sounds great! I agree watching multiple files isn't trivial to get right. Prometheus at least should contain some sample code that does this for file_sd_configs used with wildcards, for the filesystem-related part that is.

@fstab is this still WIP? Is there like an "early access branch"?

fstab commented

It's still WIP, but I currently have very limited time. As soon as I find the time it will be the first feature I will work on (except there are bugs, bugs have higher priority). I am hoping I will find some time around Easter, but I can't really promise it. If someone wants to help out and create a pull request: please go ahead!

fstab commented

I created a branch multiple-logfiles and pushed an initial implementation. It works in a quick manual test on my machine, but it's not tested extensively.

The config is pretty simple: Set config_version: 3, and use inputs instead of input. Within inputs you can put a list of inputs, like this:

global:
  config_version: 3
inputs:
- type: file
  path: ./example/exim-rejected-RCPT-examples.log
- type: file
  path: ./example/exim-rejected-RCPT-examples2.log
...

By default, a new label input is added automatically to each metric containing the path of the config file (you can reconfigure that using input_label_name in the global section and input_label_value in the inputs section, see test code for examples).

Apart from tests, there are still some things to figure out.

  • Would it be useful to support wildcards for file names in the config?
  • Should metrics be restricted to specific files (as of now, all metrics are matched with all files)?

If you have feedback, please comment here.

AKYD commented

@fstab Thank you for your work on this.

Regarding your questions :

  • I think wildcards are a nice feature

  • Matches should be specified per file, as "error" in a file might not be interesting, or it might be that in a file "error" is counter and in another file is a gauge.

Hi @fstab

Would it be useful to support wildcards for file names in the config?
Should metrics be restricted to specific files (as of now, all metrics are matched with all files)?

I think having a folder and setting a wild card like "*.log" would be a better solution since the application log rotation settings sometimes zip the older logs. and they create logs with a timestamp appended to the file name, like appliction-9999_12_30_12:00:00.log

Wildcards would be very useful. Besides it would be great to specify wildcards for directories too.
Something like: /var/log/nginx/*/*.log

I do not need wildcards but metrics per file is what I need.

Hi =) Any progress with the subject? It would be very useful feature.

fstab commented

I should really start working on it. I hope there will be some time for it in February.

Looking forward to wild card support too!

Hi,
I think grok_exporter deserves a wildcard support for Kubernetes use case: all my logs on the node are in /var/log/containers and I need to extract metrics from logs generated by a deployment (which is, pods starting with a common prefix). All the logs are named after the pod and therefore have a common prefix and a variable suffix each time I redeploy or a new pod is scheduled.

Telegraf currently does support a path with a wildcard, but I find this product too much overkilled.

It would be wonderful if you could get us a viable alternative to the InfluxData product.
Thanks

Hi @fstab Is this feature already implemented ?

fstab commented

Not yet, but I hope soon. I know I am saying this for a long time now, but with two small children even finding a free hour for hobby software projects can be challenging. I am positive that I will start working on grok_exporter regularly again, but I think I need to stop making promises as to when this will be (I think it's probably in the next month, but I thought that before and it didn't work out yet...)

Looking forward to this feature.

sc30 commented

I created a branch multiple-logfiles and pushed an initial implementation. It works in a quick manual test on my machine, but it's not tested extensively.

The config is pretty simple: Set config_version: 3, and use inputs instead of input. Within inputs you can put a list of inputs, like this:

global:
  config_version: 3
inputs:
- type: file
  path: ./example/exim-rejected-RCPT-examples.log
- type: file
  path: ./example/exim-rejected-RCPT-examples2.log
...

By default, a new label input is added automatically to each metric containing the path of the config file (you can reconfigure that using input_label_name in the global section and input_label_value in the inputs section, see test code for examples).

Apart from tests, there are still some things to figure out.

  • Would it be useful to support wildcards for file names in the config?
  • Should metrics be restricted to specific files (as of now, all metrics are matched with all files)?

If you have feedback, please comment here.

Does multiple-logfiles branch support "fail_on_missing_logfile: false/true"? I tried it locally and grok exporter still fails to start even if fail_on_missing_logfile is set to false. Is there any way we can get around with this?

fstab commented

Please don't use the multiple-logfiles branch. I started working on this on the master branch, and the implementation will not be based on that branch. Current status:

  • All file tailers now support multiple log files.
  • The pattern matcher still reads log lines from the old file tailer interface. Difference is that currently the tailer produces lines, the new tailer produces line/filename tuples. Need to use the new interface in the matcher.
  • Need tests (there are some corner cases, like moving a logfile so that it is still watched after the move, moving a logfile overwriting another logfile, etc)
  • When all tests are green, expose the new functionality in the config.

I currently manage to do one small commit per evening, if I keep this up it should be done in two weeks or so.

Hey @fstab

I just had a related question. Log files I am trying to export data from have day's date in them and they rotate daily. I wanted to tail the latest file - say access-20190410.log. Is there a workaround to do so.

Thanks,
Sandeep

fstab commented

@sandeepdharembra you are commenting in the right ticket. When this ticket is done grok_exporter will support wildcards, like access-*.log. This will monitor all files (not just the latest), but as long as no process writes to the old files it will do what you want. However, I'm afraid you need to wait until this ticket is done.

Can't wait for this! 👍

Hi,is it in progress that metrics may define filters to specify which files they apply to?it is really useful when multiple files are supported.

Hi, is this available only on the multiple-logfiles branch? I need to configure only a list of inputs with two files.

fstab commented

Please don't use this branch but start two grok_exporter instances instead. I'm still planning to finish support for multiple config files, but it will not be based on this branch.

@fstab Is this feature already implemented ? Or which branch should I use?

The last comment is still valid:

start two grok_exporter instances. I'm still planning to finish support for multiple config files, but it will not be based on this branch.

fstab commented

Good news: I just pushed an implementation that supports wildcards. Sorry that it took so long. I will need to add a few tests before building a release. If you want to try it, you can build the master branch from source.

The input section now supports wildcards in path, but only on the file level, not on a directory level. If you want more than one path, you can replace path with paths and configure a list of paths (they also may contain wildcards).

By default, the metrics are applied to all log files. If you want to restrict a metric to one or more specific paths, you can add a path or paths option to a metric configuration. This is like a filter, the metric is then only applied when the path matches.

I will describe it more detailed when I update the documentation.

fstab commented

Updated documentation is on the release branch https://github.com/fstab/grok_exporter/blob/release/CONFIG.md

fstab commented

Released it as v1.0.0.RC1. Documentation is merged to master. Please open a new issue if there are any problems with multiple log file support.

Thanks !

fstab commented