serilog-contrib/serilog-sinks-elasticsearch

Scoped variables of different types produces issues

LordMike opened this issue ยท 14 comments

Does this issue relate to a new feature or an existing bug?

  • Bug
  • New Feature

What version of Serilog.Sinks.Elasticsearch is affected? Please list the related NuGet package.
6.3.0

What is the target framework and operating system? See target frameworks & net standard matrix.

  • netCore 2.0
  • netCore 1.0
  • 4.7
  • 4.6.x
  • 4.5.x

Please describe the current behavior?
If I run code like this:

using(logger.BeginScope("My text"))
using(logger.BeginScope(new Dictionary<string, string>{ { "Property 1", "Value 1" } }
   logger.LogInformation("Hi");

I will get an error. In ES, the series of scope values are simply added as an array, so ES will figure out some type to support it. In our case, we had a number of scopes that were Dictionary<string, string>, so the Scope property ended up being object.

Then later, we had the string like the one above, and we got an error like this (string cannot be converted to Object):

2018-07-25T11:19:40.7254631Z Failed to store event with template 'Getting {AssemblyName}' into Elasticsearch. Elasticsearch reports for index logstash-2018.07.25 the following: {"type":"mapper_parsing_exception","reason":"failed to parse [fields.Scope]","caused_by":{"type":"illegal_state_exception","reason":"Can't get text on a START_OBJECT at 1:392"}}

We found the issue as we in one place used the Dictionary<string, string> and elsewhere used strings - so some logs would simply not be logged (and as these are scopes, it can be whole series of logs -- anything within the scope)..

The bug also goes the other way around. If the Scope value is initially some simple type like a string, ES will auto-map to a text property, and then any log with a scope value that is a complex value will fail.

Please describe the expected behavior?
So. I know that using Dictionary<string, object> will "work as expected", as it doesn't add to Scope but adds the individual properties instead.

Could we either:

  • Ignore Scopes when the scope value is not a simple value like a string.
  • Treat Dictionary<string, T> specially, so the T type is always handled (converted to string?)

Clarifications:

  • I'm using the Microsoft.Extensions.Logging package as well, since we're logging from ASP.Net. The BeginScope() here is from MS.
  • The issue is also present with other types as well, see below.
2018-07-26T11:43:43.0834520Z Failed to store event with template 'Completed process '{0}', PID {1}, in {2} - ExitCode: {3}' into Elasticsearch. Elasticsearch reports for index logstash-2018.07.26 the following: {"type":"mapper_parsing_exception","reason":"failed to parse [fields.2]","caused_by":{"type":"illegal_argument_exception","reason":"For input string: \"00:00:03.2551909\""}}

So here, we probably have a Property named "2" which I can see has been detected as a long... Naturally, it could be anything, as "2" is quite generic (probably an old String.Format invocation).

@nblumhardt what does Seq do to deal with properties of different types?

@LordMike Seq was designed to handle this case, so no issue over there ๐Ÿ˜Ž

Well blast.

Ok. So just to get this clear: This is a problem that will affect all users of the ES sink, if they use properties of different types. It should also affect users who use Serilog directly, as their event properties can change between Scalar values (which could be of any type) and f.ex. sequences (which again could contain one uniform type or many different types).

I would say this is a pretty critical thing for this sink.

Wouldn't you think so, @nblumhardt ?

I've tested using Serilog only.

        static void Main(string[] args)
        {
            SelfLog.Enable(Console.Out);

            using (var logger = new LoggerConfiguration()
                .WriteTo.Elasticsearch(new ElasticsearchSinkOptions(new Uri("http://localhost:9200"))
                {
                    IndexFormat = "test-logstash-{0:yyyy.MM.dd}"
                })
                .CreateLogger())
            {
                logger.Information("Hello world {MyProperty}, {MyOtherProperty}", new object[] { 4, "string" }, 66);
            }
        }

Error (selflog):
2018-08-01T12:38:22.5428021Z Failed to store event with template 'Hello world {MyProperty}, {MyOtherProperty}' into Elasticsearch. Elasticsearch reports for index test-logstash-2018.08.01 the following: {"type":"illegal_argument_exception","reason":"mapper [fields.MyProperty] of different type, current_type [text], merged_type [long]"}

And also:

        static void Main(string[] args)
        {
            SelfLog.Enable(Console.Out);

            using (var logger = new LoggerConfiguration()
                .WriteTo.Elasticsearch(new ElasticsearchSinkOptions(new Uri("http://localhost:9200"))
                {
                    IndexFormat = "test2-logstash-{0:yyyy.MM.dd}"
                })
                .CreateLogger())
            {
                logger.Information("Hello world {MyProperty}", 4);
                logger.Information("Hello world {MyProperty}", "string");
            }
        }

Error:
2018-08-01T12:40:19.2854591Z Failed to store event with template 'Hello world {MyProperty}' into Elasticsearch. Elasticsearch reports for index test2-logstash-2018.08.01 the following: {"type":"illegal_argument_exception","reason":"mapper [fields.MyProperty] of different type, current_type [long], merged_type [text]"}

Relevant: #47

Thanks for the follow-up @LordMike.

As I understand it, this is a limitation on the Elastic side rather than a Serilog issue - I don't think there's much that can be done about it at this end.

True. One drastic approach would be to ensure that all properties always are of the same type (render them to strings f.ex.) ..

As I see it right now. The ES sink is useless - it loses logs "randomly", and should not be used with structured logging.

Elasticsearch has some sort of type system (mapping based on a template when a new index is created), so storing an integer and an object in the same field (in the index) will not work.

I agree with you that it is not explicit when submitting the logs to a sink like ES and it will fail silently (unless you listen to the errors returned from the bulk post inside the sink). It is all a tradeoff between speed (the bulk insert) and convenience and inside your code, you need to be careful naming your properties.

I know some people implemented their own serializer to be swapped in the sink, that would effectively add the type to the field name (e.g. string_field1 etc).

My best advice; carefully design your log messages anyway so they make sense to query on. ES also has a default max of the number of fields inside an index. So just throwing in bunches of different fields might fail anyway at some point.

We've taken a stab at this to get this rolling, where we won't have to redo our logging .. yet ... but while still keeping ES running with useful queries.

This custom formatter will rename Properties given their data type. So "MyValue" which contains an integer, will become "IMyValue" - thereby removing conflicts in ES. I'm putting the code here, both to get @mivano & @nblumhardt 's feedback, but also so others can benefit from it.

Since this issue will always be a problem in ES sinks, it could also make sense to embed this as a default offering in the ES sink. This code currently only operates on ScalarValues, and not sequences. Sequences would be great to support, but it's more code than I care to write right now, since we don't need it :P.

    internal class EsPropertyTypeNamedFormatter : ElasticsearchJsonFormatter
    {
        // Use a property writer that can change the property names and values before writing to ES
        protected override void WritePropertiesValues(IReadOnlyDictionary<string, LogEventPropertyValue> properties, TextWriter output)
        {
            string precedingDelimiter = "";
            foreach (KeyValuePair<string, LogEventPropertyValue> property in properties)
            {
                char type;
                object value;

                // Modify property name
                if (property.Value is ScalarValue asScalar)
                {
                    if (asScalar.Value is DateTime || asScalar.Value is DateTimeOffset)
                    {
                        type = 'D';
                        value = asScalar.Value;
                    }
                    else if (asScalar.Value is long || asScalar.Value is int || asScalar.Value is short || asScalar.Value is byte ||
                             asScalar.Value is ulong || asScalar.Value is uint || asScalar.Value is ushort || asScalar.Value is sbyte)
                    {
                        type = 'I';
                        value = asScalar.Value;
                    }
                    else if (asScalar.Value is float || asScalar.Value is double || asScalar.Value is decimal)
                    {
                        type = 'F';
                        value = asScalar.Value;
                    }
                    else
                    {
                        type = 'S';
                        value = asScalar.Value?.ToString();
                    }
                }
                // TODO: Support sequences, problem: Sequences can contain elements of different types
                // OTOH: all properties in ES are sequences already, so it'd be great to utilize that -- some way of saying "if all elements are datetimes; store as datetimes"
                //else if (property.Value is SequenceValue asSequence)
                //{
                //    // Check types
                //    Type type = null;
                //    foreach (var item in asSequence.Elements)
                //    {

                //    }
                //}
                else
                {
                    // Convert to string
                    type = 'S';
                    value = property.Value?.ToString();
                }

                string key = type + property.Key;
                WriteJsonProperty(key, value, ref precedingDelimiter, output);
            }
        }
    }

Use by setting the CustomFormatter:

new ElasticsearchSinkOptions(config.Uri)
{
    CustomFormatter = new EsPropertyTypeNamedFormatter()
}

Thanks for sharing the code, @LordMike! Guessing that anyone who wants to use this will be able to modify it to handle whatever data types they care about.

Nice solution, thanks for sharing. I will create a link to this issue so it is clear what the behavior is. There might be more sinks having this issue where the mapping is not set per document/record, but rather on the container the document is in.
Be also aware of mapping explosion (https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping.html#mapping-limit-settings).

// TODO: Support sequences, problem: Sequences can contain elements of different types
// OTOH: all properties in ES are sequences already, so it'd be great to utilize that -- some way of saying "if all elements are datetimes; store as datetimes"

I've solved it following way: if sequence contains values of the same type I store them as is (satisfies "if all elements are datetimes; store as datetimes"), otherwise store all elements as strings (i.e., keywords).

Thanks for sharing the code @LordMike!