Provide a way to bundle error message/info into to the combined log entry
qingling128 opened this issue · 10 comments
Originally from the conversation in #11.
It's a common case that developers use logging frameworks (e.g. logback
for Java) to help structure error logs. Some additional error message or other info is frequently added in front of the actual stack trace. For example the logback
framework for Java would prepend a line with "ERROR " in front of the stacktrace like this:
ERROR [2017-03-21 07:16:42,627] u.s.d.Jobs <http://u.s.d.jobs/> {}
uw.smartmetering.domain.Jobs <http://uw.smartmetering.domain.jobs/>:
Job uw.smartmetering.tma.PollStatusFeedJob@7091121 failed
javax.ws.rs.ProcessingException:
org.apache.http.conn.ConnectTimeoutException: Connect to
ssp-api-stg01.tmadata.co.uk:443
[ssp-api-stg01.tmadata.co.uk/80.195.118.185,
ssp-api-stg01.tmadata.co.uk/217.138.85.11] failed: connect
at io.dropwizard.client.DropwizardApacheConnector.apply(DropwizardApacheConnector.java:109)
at org.glassfish.jersey.client.ClientRuntime.invoke(ClientRuntime.java:252)
at org.glassfish.jersey.client.JerseyInvocation$1.call(JerseyInvocation.java:684)
Caused by: org.apache.http.conn.ConnectTimeoutException: Connect to ssp-api-stg01.tmadata.co.uk:443
[ssp-api-stg01.tmadata.co.uk/80.195.118.185,
ssp-api-stg01.tmadata.co.uk/217.138.85.11] failed: connect timed out
at org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:150)
at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.connect(PoolingHttpClientConnectionManager.java:353)
at org.apache.http.impl.execchain.MainClientExec.establishRoute(MainClientExec.java:380)
... 26 common frames omitted
Caused by: java.net.SocketTimeoutException: connect timed out
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:207)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
... 36 common frames omitted
It would be nice if we could provide an option for developers to bundle these error messages/info into the combined log entry. Note that:
- This solution should be language independent.
- The format or this piece of info (error message, timestamp or other info) may vary from language to language and from framework to framework. So we need to make the format configurable, and leave the regex to developers' decisions.
Hi,
Can you please share any progress on this?
Many thanks,
Javi
Hi @jcotado, sorry we've not yet got a chance to evaluate / prioritize it. It's on the agenda for our next weekly meeting. Are you somehow blocked by this?
CC @rbuskens, @igorpeshansky, @dhrupadb for priority. I wonder if it's something we would like to squeeze into this quarter. It's a nice and helpful feature to have. Not a blocking one though I guess?
(@jcotado: correct me if i'm wrong).
@igorpeshansky Javi and I chat a bit about using in_tail
in the previous thread: #11. They are ingesting logs from containers in a kubernetes cluster by mounting /var/log
. Kubernetes
has fluentd configs built into the add-on image fluentd-es-image. There is some on-going effort to make it customizable by mounting configs to a certain path, but it's not available yet I believe.
@jcotado: One solution here, if we stick to using in_tail
, would be to edit the td-agent.conf file and build a customized version of fluentd-es-image to use. So we can "customize" the fluentd config indirectly.
@qingling128 We are already using our own version of fluentd image, our config can be seen here: https://github.com/utilitywarehouse/fluentd/blob/master/fluent.conf
Could you please elaborate on the changes to the in_tail
to allow formatting, bearing in mind this source is shared by all applications (does that matter?)
Thanks!
Hi @george-angel,
Sorry, I missed this one in my mailbox. Good to know you are already using your own version of the image. The instruction of multiline
format of in_tail
can be here. Look for multiline
.
Quoting the section for Java exception here:
"You can parse Java like stacktrace logs with multiline. Here is a configuration example.
format multiline
format_firstline /\d{4}-\d{1,2}-\d{1,2}/
format1 /^(?<time>\d{4}-\d{1,2}-\d{1,2} \d{1,2}:\d{1,2}:\d{1,2}) \[(?<thread>.*)\] (?<level>[^\s]+)(?<message>.*)/
If you have a following log:
2013-3-03 14:27:33 [main] INFO Main - Start
2013-3-03 14:27:33 [main] ERROR Main - Exception
javax.management.RuntimeErrorException: null
at Main.main(Main.java:16) ~[bin/:na]
2013-3-03 14:27:33 [main] INFO Main - End
It will be parsed as:
2013-03-03 14:27:33 +0900 zimbra.mailbox: {"thread":"main","level":"INFO","message":" Main - Start"}
2013-03-03 14:27:33 +0900 zimbra.mailbox: {"thread":"main","level":"ERROR","message":" Main - Exception\njavax.management.RuntimeErrorException: null\n at Main.main(Main.java:16) ~[bin/:na]"}
2013-03-03 14:27:33 +0900 zimbra.mailbox: {"thread":"main","level":"INFO","message":" Main - End"}
@qingling128 First of all, I appreciate this isn't an issue with GoogleCloudPlatform, and I appreciate you are trying to help us.
Our input is not a Java stacktrace, as we are ingesting docker logs, and its actually json:
{"time":"2017-01-08T11:22:33.444Z", "stream": "stderr", "log":"10.2.22.19 - - [21/Mar/2017:07:16:41 +0000] \"POST /trigger-jobs HTTP/1.1\" 200 20 \"-\" \"curl/7.52.1\" 2\n"}
{"time":"2017-01-08T11:22:33.444Z", "stream": "stderr", "log":"ERROR [2017-03-21 07:16:42,627] u.s.d.Jobs {} uw.smartmetering.domain.Jobs: Job uw.smartmetering.tma.PollStatusFeedJob@7091121b failed\n"}
{"time":"2017-01-08T11:22:33.444Z", "stream": "stderr", "log":"javax.ws.rs.ProcessingException: org.apache.http.conn.ConnectTimeoutException: Connect to ssp-api-stg01.tmadata.co.uk:443 [ssp-api-stg01.tmadata.co.uk/80.195.118.185, ssp-api-stg01.tmadata.co.uk/217.138.85.11] failed: connect timed out\n"}
{"time":"2017-01-08T11:22:33.444Z", "stream": "stderr", "log":" at io.dropwizard.client.DropwizardApacheConnector.apply(DropwizardApacheConnector.java:109)\n"}
{"time":"2017-01-08T11:22:33.444Z", "stream": "stderr", "log":" at org.glassfish.jersey.client.ClientRuntime.invoke(ClientRuntime.java:252)\n"}
{"time":"2017-01-08T11:22:33.444Z", "stream": "stderr", "log":" at org.glassfish.jersey.client.JerseyInvocation$1.call(JerseyInvocation.java:684)\n"}
{"time":"2017-01-08T11:22:33.444Z", "stream": "stderr", "log":" at org.glassfish.jersey.client.JerseyInvocation$1.call(JerseyInvocation.java:681)\n"}
{"time":"2017-01-08T11:22:33.444Z", "stream": "stderr", "log":" at org.glassfish.jersey.internal.Errors.process(Errors.java:315)\n"}
{"time":"2017-01-08T11:22:33.444Z", "stream": "stderr", "log":" at org.glassfish.jersey.internal.Errors.process(Errors.java:297)\n"}
{"time":"2017-01-08T11:22:33.444Z", "stream": "stderr", "log":" at org.glassfish.jersey.internal.Errors.process(Errors.java:228)\n"}
{"time":"2017-01-08T11:22:33.444Z", "stream": "stderr", "log":" at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:444)\n"}
{"time":"2017-01-08T11:22:33.444Z", "stream": "stderr", "log":" at org.glassfish.jersey.client.JerseyInvocation.invoke(JerseyInvocation.java:681)\n"}
{"time":"2017-01-08T11:22:33.444Z", "stream": "stderr", "log":" at org.glassfish.jersey.client.JerseyInvocation$Builder.method(JerseyInvocation.java:411)\n"}
{"time":"2017-01-08T11:22:33.444Z", "stream": "stderr", "log":" at uw.tmaclient.TmaClient$_makeRequestNoValidation_closure7.doCall(TmaClient.groovy:144)\n"}
{"time":"2017-01-08T11:22:33.444Z", "stream": "stderr", "log":" at uw.circuitbreaker.Circuit.withBreaker(Circuit.groovy:33)\n"}
{"time":"2017-01-08T11:22:33.444Z", "stream": "stderr", "log":" at uw.tmaclient.TmaClient.makeRequestNoValidation(TmaClient.groovy:134)\n"}
{"time":"2017-01-08T11:22:33.444Z", "stream": "stderr", "log":" at uw.tmaclient.TmaClient$_makeRequest_closure6.doCall(TmaClient.groovy:112)\n"}
{"time":"2017-01-08T11:22:33.444Z", "stream": "stderr", "log":" at uw.circuitbreaker.Circuit.withBreaker(Circuit.groovy:33)\n"}
{"time":"2017-01-08T11:22:33.444Z", "stream": "stderr", "log":" at uw.tmaclient.TmaClient.makeRequest(TmaClient.groovy:110)\n"}
{"time":"2017-01-08T11:22:33.444Z", "stream": "stderr", "log":" at uw.tmaclient.TmaClient.doGet(TmaClient.groovy:107)\n"}
{"time":"2017-01-08T11:22:33.444Z", "stream": "stderr", "log":" at uw.tmaclient.TmaClient.get(TmaClient.groovy:86)\n"}
{"time":"2017-01-08T11:22:33.444Z", "stream": "stderr", "log":" at uw.tmaclient.TmaStatusClient.getSince(TmaStatusClient.groovy:31)\n"}
{"time":"2017-01-08T11:22:33.444Z", "stream": "stderr", "log":" at uw.smartmetering.tma.StatusFeedConsumer.doForEachNewStatus(StatusFeedConsumer.groovy:34)\n"}
{"time":"2017-01-08T11:22:33.444Z", "stream": "stderr", "log":" at uw.smartmetering.tma.StatusFeedConsumer.forEachNewStatus(StatusFeedConsumer.groovy:24)\n"}
{"time":"2017-01-08T11:22:33.444Z", "stream": "stderr", "log":" at uw.smartmetering.tma.PollStatusFeedJob.run(PollStatusFeedJob.groovy:44)\n"}
{"time":"2017-01-08T11:22:33.444Z", "stream": "stderr", "log":" at uw.smartmetering.domain.Jobs$_run_closure1$_closure2.doCall(Jobs.groovy:49)\n"}
{"time":"2017-01-08T11:22:33.444Z", "stream": "stderr", "log":" at uw.smartmetering.domain.Jobs$_run_closure1.doCall(Jobs.groovy:47)\n"}
{"time":"2017-01-08T11:22:33.444Z", "stream": "stderr", "log":" at uw.smartmetering.db.DbLock.ifOwned(DbLock.groovy:40)\n"}
{"time":"2017-01-08T11:22:33.444Z", "stream": "stderr", "log":" at uw.smartmetering.domain.Jobs.run(Jobs.groovy:43)\n"}
{"time":"2017-01-08T11:22:33.444Z", "stream": "stderr", "log":" at uw.smartmetering.http.TriggerJobsResource.triggerJobs(TriggerJobsResource.groovy:27)\n"}
{"time":"2017-01-08T11:22:33.444Z", "stream": "stderr", "log":"Caused by: org.apache.http.conn.ConnectTimeoutException: Connect to ssp-api-stg01.tmadata.co.uk:443 [ssp-api-stg01.tmadata.co.uk/80.195.118.185, ssp-api-stg01.tmadata.co.uk/217.138.85.11] failed: connect timed out\n"}
{"time":"2017-01-08T11:22:33.444Z", "stream": "stderr", "log":" at org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:150)\n"}
{"time":"2017-01-08T11:22:33.444Z", "stream": "stderr", "log":" at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.connect(PoolingHttpClientConnectionManager.java:353)\n"}
{"time":"2017-01-08T11:22:33.444Z", "stream": "stderr", "log":" at org.apache.http.impl.execchain.MainClientExec.establishRoute(MainClientExec.java:380)\n"}
{"time":"2017-01-08T11:22:33.444Z", "stream": "stderr", "log":" at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:236)\n"}
{"time":"2017-01-08T11:22:33.444Z", "stream": "stderr", "log":" at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:184)\n"}
{"time":"2017-01-08T11:22:33.444Z", "stream": "stderr", "log":" at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:88)\n"}
{"time":"2017-01-08T11:22:33.444Z", "stream": "stderr", "log":" at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110)\n"}
{"time":"2017-01-08T11:22:33.444Z", "stream": "stderr", "log":" at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:184)\n"}
{"time":"2017-01-08T11:22:33.444Z", "stream": "stderr", "log":" at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)\n"}
{"time":"2017-01-08T11:22:33.444Z", "stream": "stderr", "log":" at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:107)\n"}
{"time":"2017-01-08T11:22:33.444Z", "stream": "stderr", "log":" at io.dropwizard.client.DropwizardApacheConnector.apply(DropwizardApacheConnector.java:87)\n"}
{"time":"2017-01-08T11:22:33.444Z", "stream": "stderr", "log":" ... 26 common frames omitted\n"}
{"time":"2017-01-08T11:22:33.444Z", "stream": "stderr", "log":"Caused by: java.net.SocketTimeoutException: connect timed out\n"}
{"time":"2017-01-08T11:22:33.444Z", "stream": "stderr", "log":" at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)\n"}
{"time":"2017-01-08T11:22:33.444Z", "stream": "stderr", "log":" at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:207)\n"}
{"time":"2017-01-08T11:22:33.444Z", "stream": "stderr", "log":" at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)\n"}
{"time":"2017-01-08T11:22:33.444Z", "stream": "stderr", "log":" at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)\n"}
{"time":"2017-01-08T11:22:33.444Z", "stream": "stderr", "log":" at java.net.Socket.connect(Socket.java:589)\n"}
{"time":"2017-01-08T11:22:33.444Z", "stream": "stderr", "log":" at org.apache.http.conn.ssl.SSLConnectionSocketFactory.connectSocket(SSLConnectionSocketFactory.java:337)\n"}
{"time":"2017-01-08T11:22:33.444Z", "stream": "stderr", "log":" at org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:141)\n"}
{"time":"2017-01-08T11:22:33.444Z", "stream": "stderr", "log":" ... 36 common frames omitted\n"}
{"time":"2017-01-08T11:22:33.444Z", "stream": "stderr", "log":"ERROR [2017-04-25 04:13:26,118] u.s.i.Jobs {} uw.smartmetering.infrastructure.Jobs: Job uw.smartmetering.common.adapter.tma.PollScheduledReadingsJob@c77718c failed\n"}
{"time":"2017-01-08T11:22:33.444Z", "stream": "stderr", "log":"javax.ws.rs.ProcessingException: java.net.UnknownHostException: tma-mock\n"}
{"time":"2017-01-08T11:22:33.444Z", "stream": "stderr", "log":" at io.dropwizard.client.DropwizardApacheConnector.apply(DropwizardApacheConnector.java:109)\n"}
{"time":"2017-01-08T11:22:33.444Z", "stream": "stderr", "log":" at org.glassfish.jersey.client.ClientRuntime.invoke(ClientRuntime.java:252)\n"}
{"time":"2017-01-08T11:22:33.444Z", "stream": "stderr", "log":" at org.glassfish.jersey.client.JerseyInvocation$1.call(JerseyInvocation.java:684)\n"}
{"time":"2017-01-08T11:22:33.444Z", "stream": "stderr", "log":" at org.glassfish.jersey.client.JerseyInvocation$1.call(JerseyInvocation.java:681)\n"}
{"time":"2017-01-08T11:22:33.444Z", "stream": "stderr", "log":" at org.glassfish.jersey.internal.Errors.process(Errors.java:315)\n"}
this works in detect_exceptions
plugin by specifying message log
so it selects only the string value from the json key log
. And then applies formatting to that.
Also from my limited understanding, in_tail
is a source, we already have a source in_tail
and format on that has to remain as json
, as its collecting all container logs. Can it act as a "filter"?
If it does, would we have to regex for the entire part of the json string: {"time":"2017-01-08T11:22:33.444Z", "stream": "stderr", "log":"ERROR [2017-03-21 07:16:42,627]
Does that make sense?
Yeah, detect_exception
plugin supports extracting the message from one entry of the json by specifying a message
field. Unfortunately we won't be able to make use of that when using in_tail
.
You can have multiple in_tail
sources with different paths that point to different log files. To support the case here, you would need to have a separate in_tail
setting for the JVM log files and regex for the entire part with multiline
enabled.
@qingling128 But all logs are actually under a single path of /var/log/docker
. I had a bit of a play and I think ultimately what we are trying to match it is too generic to do it with fluent. Or it would require a lot of logic.
We are going to investigate either modifying the application logs to make it easier or search for a combination of plugins to do this.
Happy to close the Issue, thanks for your help!
Thanks for letting me know! I'll close this issue for now then. :)