masesgroup/KEFCore

Testing the modification introduced from #64 seems the speed of LINQ query is very poor

Closed this issue · 1 comments

Testing the modification introduced from this PR seems the speed of LINQ query is very poor; the following code

context.Posts.Single(b => b.BlogId == 2)

takes more or less 16 seconds to be executed as reported from the following output:
Elapsed context.Posts.Single(b => b.BlogId == 2) 16299 ms. Result is PostId: 2 Title: title Content: 2 BlogId: 2

Originally posted by @mariomastrodicasa in #64 (comment)

Adding some measurement objects we found this:

KafkaEnumerator _moveNextSw: 00:00:00.2211977 _currentSw: 00:00:13.7689249 _valueSerdesSw: 00:00:00.1202403 _valueBufferSw: 00:00:13.4891879
Elapsed context.Posts.Single(b => b.BlogId == 2) 16586 ms. Result is PostId: 2 Title: title Content: 2 BlogId: 2

The highest time lost is measured from _valueBufferSw which is associated to:

var data = new ValueBuffer(entityTypeData.GetData(_entityType));

Since the enumerator iterates over 1000 elements, the mean time is 13 milliseconds in this test.

So the code wasting time is:

    public object[] GetData(IEntityType tName)
    {
        if (Data == null) return null;

        var array = Data.Select((o) => o.Value.Value).ToArray();

        return array;
    }

Updating the code with the following:

    public object[] GetData(IEntityType tName)
    {
#if DEBUG
        Stopwatch fullSw = new Stopwatch();
        Stopwatch newSw = new Stopwatch();
        Stopwatch iterationSw = new Stopwatch();
        try
        {
            fullSw.Start();
#endif
            if (Data == null) return null;
#if DEBUG
            newSw.Start();
#endif
            object[] array = new object[Data.Count];
#if DEBUG
            newSw.Stop();
            iterationSw.Start();
#endif
            for (int i = 0; i < Data.Count; i++)
            {
                array[i] = Data[i].Value;
            }
#if DEBUG
            iterationSw.Stop();
#endif
            return array;
#if DEBUG
        }
        finally
        {
            fullSw.Stop();
            Trace.WriteLine($"Time to GetData with length {Data.Count}: {fullSw.Elapsed} - new array took: {newSw.Elapsed} - Iteration took: {iterationSw.Elapsed}");
        }
#endif
    }

The previous code reports multiple occurrences like:

Time to GetData with length 4: 00:00:00.0115625 - new array took: 00:00:00.0000003 - Iteration took: 00:00:00.0000006

The time used from each invocation of GetData is variable between 10 and 22 milliseconds, however only the iteration needs between 5 and 10 microseconds and from the tests seems the invocation of GetData waste most of the time returning the array. Changing the signature of the GetData with following:

void GetData(IEntityType tName, ref object[] array);

a new test reports:

KafkaEnumerator _moveNextSw: 00:00:00.0862662 _currentSw: 00:00:00.1839915 _valueSerdesSw: 00:00:00.0620605 _valueBufferSw: 00:00:00.0006946
Elapsed context.Posts.Single(b => b.BlogId == 2) 1990 ms. Result is PostId: 2 Title: title Content: 2 BlogId: 2

Continuing the tests the output is:

KafkaEnumerator _moveNextSw: 00:00:00.1052086 _currentSw: 00:00:00.1974484 _valueSerdesSw: 00:00:00.0492837 _valueBufferSw: 00:00:00.0012247
Elapsed context.Posts.Single(b => b.BlogId == 1) 340 ms. Result is PostId: 1 Title: title Content: 1 BlogId: 1

More or less the KafkaEnumerator takes the same time, but the test is executed with an highest speed. The first test has a lower speed due to data retrieve from the Kafka cluster, while the second test uses the data stored in the KafkaEnumerator.