goldmansachs/reladomo

Improve sql logging by adding additional context.

Opened this issue · 7 comments

Reladomo's sql logging is so useful that I sometimes leave it turned on in production.

DEBUG com.gs.fw.common.mithra.sqllogs.MyClass: connection:1030722971 find with: select t0.key,t0.number from MY_CLASS t0 where  t0.key = 'abcd' and t0.system_to = '9999-12-01 23:59:00.000'
DEBUG com.gs.fw.common.mithra.sqllogs.MyClass: retrieved 1 objects, 2.0 ms per

I have some ideas on how to make this logging even more useful.

It's currently hard to log only sql, or only timings, without the other. That's because both log statements using the same logger, com.gs.fw.common.mithra.sqllogs.MyClass in this example. I propose adding markers for each.

private static final Marker MARKER_SQL = MarkerFactory.getMarker("Reladomo SQL");
private static final Marker MARKER_TIMING = MarkerFactory.getMarker("Reladomo Timing");

And adding the markers to the relevant log statements. For example, change this:

logger.debug("source '" + source + "': " + text);

to

logger.debug(MARKER_SQL, "source '" + source + "': " + text);

In addition, the timing information could be even more useful if I didn't have to parse it out of the logged message. I propose adding the timing information to MDC. That way I more easily create dashboards of the amount of time spent on queries over time.

For example, change this:

this.sqlLogger.debug("retrieved " + this.rowCount + " objects, " +
        ((this.rowCount > 0) ? ((totalTime / this.rowCount) + " ms per") : (totalTime +
                " ms")));

to

MDC.put("rowCount", String.valueOf(this.rowCount));
MDC.put("totalTime", String.valueOf(totalTime));
try
{
    this.sqlLogger.debug(MARKER_TIMING, "retrieved " + this.rowCount + " objects, " + ((this.rowCount > 0) ? (totalTime / this.rowCount + " ms per") : (totalTime + " ms")));
}
finally
{
    MDC.remove("rowCount");
    MDC.remove("totalTime");
}

I'm happy to contribute these changes but I wanted to discuss here before working on the whole thing.

I'm starting with the assumption that changing the existing logging is undesirable because users already rely on the current behavior. So this proposal adds markers and MDC but doesn't change, remove, or combine any messages. What do you think?

I'm a little confused about the MDC, but before we go there, I want to make sure you're aware of two existing hooks here.

  1. There is aggregate performance data in the runtime, on the portal object MithraPerformanceData getPerformanceData();
  2. There is a way to snoop on the SQL via SqlLogSnooper. This might be closer to what you're doing above.

So the first question is, should we somehow improve the snooper?
And the second question: if the MDC key/value is getting cleared right after the log, what good is it? (worded differently: how're you going to use it)

Oh, I'm fine with the markers as long as the default is blank.

The more I read this, the more I'm convinced I misread the intent. Is this meant to be some sort of in-VM dashboard or is it an external process reading logs and you just want to make the parsing easier?

Thanks for the quick reply. I didn't know about getPerformanceData() and I think I'll make use of it. I think MDC would be useful too. I'll set up an example.

private static final Logger LOGGER = LoggerFactory.getLogger(ExampleApplication.class);
private static final Marker MARKER = MarkerFactory.getMarker("Example marker");
MDC.put("example metric", String.valueOf(123));
try
{
    LOGGER.info(MARKER, "Example metric: " + 123);
}
finally
{
    MDC.remove("example metric");
}

I set up a logback console appender with this pattern:

%highlight(%-5level) %cyan(%date{HH:mm:ss}) [%white(%thread)] <%cyan(%marker)> {%magenta(%mdc)} %green(%logger): %message%n

And I get output like:

INFO  21:29:52 [main] <Example marker> {example metric=123} com.example.ExampleApplication: Example metric: 123

If I remove the marker and mdc from my pattern:

%highlight(%-5level) %cyan(%date{HH:mm:ss}) [%white(%thread)] %green(%logger): %message%n

Then they simply aren't logged:

INFO  21:31:13 [main] com.example.ExampleApplication: Example metric: 123

Dropwizard comes with a json log format

The same log message gets output like this:

{"level":"INFO","logger":"com.example.ExampleApplication","thread":"main","message":"Example metric: 123","mdc":{"example metric":"123"},"timestamp":1567042492086}

I just now realized that the marker is missing. I'll try to fix that upstream in Dropwizard. Another problem is that all MDC values are strings. Downsides aside, the json format, as well as other tools like logstash-logback-encoder or elk appenders help me get this data into Kibana/Graphana where I can create dashboards. For example, a histogram that shows the sum of all time spent in database queries for each 5 minute window. For this data to be the most useful, I need it to be per query, and allow Kibana/Graphana to perform the aggregations.

is it an external process reading logs and you just want to make the parsing easier?

Exactly. I should have started with that. These additions would make it easier to use the logs in Kibana/Graphana and probably other tools too.

Ok, I understand the intent: external log crawler that prefers a standard parsable format (json).

Please submit the code 😄
A couple of notes: all work/garbage must stay inside isDebugEnabled(), and even then try to be frugal (e.g. reuse strings). There is also a lot of places we log. Try to get all the relevant ones.

Will do, thanks 👍