ironfede/openmcdf

Performance degradation in recent commits

michaelp opened this issue · 11 comments

I am trying to take a look at the performance degradation in the last few commits.
Regardless of the results, I would like to share the initial benchmark that I created.

Introduction
There are several test projects in the openmcdf solution that have name/intention to be performance targeting. I might be missing something, but it seems like they are more like an internal tool used by core maintenance team and not an attempt to have a benchmark test suite of any kind. In addition, the lack of the standard format for benchmark results makes it difficult to reproduce results on different machines and difficult to communicate the changes/finding/improvements.

Assumptions:

  1. To get stable results I would like to focus on the file that resides completely in memory
  2. The focus of my first scenario is purely on the reader perfromance.

Goals:

  1. Provide a simple way to point out the irregularitites
  2. Establish a framework for justification for performance changes

In addition, my initial scenario is very simple.

  • Create in memory a file with a single storage and a few streams of 1 MB each
  • measure reading a stream using preallocated buffer of different sizes (it means i would expect that ideally, no additional memory should be allocated and performance should great )

Here are the starting numbers, right of the master
Notice that we read same amount of information and the results are sorted from the fastest to slowest. Last lines are the problematic one.

BenchmarkDotNet=v0.11.5, OS=Windows 10.0.17134.648 (1803/April2018Update/Redstone4)
Intel Core i7-8850H CPU 2.60GHz (Coffee Lake), 1 CPU, 12 logical and 6 physical cores
.NET Core SDK=3.0.100-preview3-010431
  [Host] : .NET Core 2.2.3 (CoreCLR 4.6.27414.05, CoreFX 4.6.27414.05), 64bit RyuJIT
  Core   : .NET Core 2.2.3 (CoreCLR 4.6.27414.05, CoreFX 4.6.27414.05), 64bit RyuJIT

Job=Core  Runtime=Core  
Method BufferSize TotalStreamSize Mean Error StdDev Gen 0 Gen 1 Gen 2 Allocated
Test 1048576 1048576 309.6 us 6.064 us 5.672 us 188.4766 93.7500 - 1.11 MB
Test 524288 1048576 333.9 us 6.485 us 7.468 us 195.8008 97.6563 - 1.12 MB
Test 262144 1048576 403.8 us 5.643 us 5.002 us 213.8672 106.9336 - 1.16 MB
Test 131072 1048576 436.4 us 2.850 us 2.666 us 210.4492 104.9805 - 1.22 MB
Test 4096 1048576 5,211.3 us 58.252 us 48.643 us 1000.0000 382.8125 - 5.42 MB
Test 1024 1048576 20,646.4 us 409.656 us 613.153 us 3343.7500 593.7500 125.0000 18.42 MB
Test 512 1048576 39,732.5 us 791.207 us 1,385.735 us 7000.0000 500.0000 214.2857 35.74 MB

Quick investigation led to an hypotisis that
private static void EnsureUniqueSectorIndex(int nextSecID, HashSet<int> processedSectors) in the file CompoundFile.cs is one of the causes of performance slowdown. Quick review of the area shows that HashSet without proper capacity set will be constantly growing and reallocating internal arrays. Which in turn will cause GC pressure and might slow down the performance.

To validate my hypothesis i made a single change

 private static void EnsureUniqueSectorIndex(int nextSecID, HashSet<int> processedSectors)
        {
	        //if (processedSectors.Contains(nextSecID))
	        //{
		       // throw new CFCorruptedFileException("The file is corrupted.");
	        //}

	        //processedSectors.Add(nextSecID);
        }

And results are way better

BenchmarkDotNet=v0.11.5, OS=Windows 10.0.17134.648 (1803/April2018Update/Redstone4)
Intel Core i7-8850H CPU 2.60GHz (Coffee Lake), 1 CPU, 12 logical and 6 physical cores
.NET Core SDK=3.0.100-preview3-010431
  [Host] : .NET Core 2.2.3 (CoreCLR 4.6.27414.05, CoreFX 4.6.27414.05), 64bit RyuJIT
  Core   : .NET Core 2.2.3 (CoreCLR 4.6.27414.05, CoreFX 4.6.27414.05), 64bit RyuJIT

Job=Core  Runtime=Core  
Method BufferSize TotalStreamSize Mean Error StdDev Median Gen 0 Gen 1 Gen 2 Allocated
Test 1048576 1048576 263.8 us 5.409 us 6.012 us 262.6 us 181.6406 90.3320 - 1.09 MB
Test 524288 1048576 290.8 us 4.712 us 4.407 us 291.4 us 183.1055 91.3086 - 1.1 MB
Test 262144 1048576 347.4 us 6.872 us 9.407 us 344.8 us 190.4297 94.7266 - 1.11 MB
Test 131072 1048576 399.2 us 7.768 us 13.399 us 394.5 us 199.7070 99.6094 - 1.13 MB
Test 4096 1048576 3,337.0 us 66.090 us 127.332 us 3,278.9 us 398.4375 199.2188 - 2.27 MB
Test 1024 1048576 12,268.9 us 199.769 us 186.864 us 12,289.2 us 1000.0000 453.1250 - 5.83 MB
Test 512 1048576 23,864.0 us 464.640 us 456.339 us 23,761.8 us 1750.0000 375.0000 125.0000 10.57 MB

Conclusion:

  • I would recommend to revert the commit de52b97. In general validation should be an external functionality that allows user to check integrity of the storage layer
  • I would like to push the benchmark into the repository (if @ironfede is ok with it)
* I would recommend to revert the  commit [de52b97](https://github.com/ironfede/openmcdf/commit/de52b9783864dceea3ca1c65334863ba63ab65b3). In general validation should be an external functionality that allows user to check integrity of the storage layer

One of the reasons for those changes were to avoid some problems with the reader getting stuck in an infinite loop and/or going round in circles before throwing OutOfMemoryException, which doesn't seem like a thing to be left to callers?

Possibly it can be done in a better way though (seems a shame that HashSet doesn't have a means of reserving capacity until the latest .NET runtime versions)

In essence, we are facing the standard fat filesystem consistency check. I don't remember by heart but I am confident that we can come with a solution ( :) this is a standard interview question to find a loop in linked list)

Benchmark project commited. @michaelp do you have the time to try somthing different from Hashset container to check for loop AND keep performance at optimal levels?

@ironfede yes I will try it this weekend. I will try Floyd's algorithm for loop detection

@ironfede
Federico, it would be nice if we could discuss a few general topics like

  • Blocks management
  • Fragmentation
  • Decoupling between page & IO management and logic of the file system chains which will improve performance, and unblock concurrency

As you're looking in this area, and #40 is related to the checks anyway:
If I debug the Test_CorruptedSectorChain_Doc test, then it gets to

if (header.DIFATSectorsNumber != 0)

With header.DIFATSectorsNumber set to 268435456, which in #40 was causing it to run out of memory before the existing 'validationCount' check failed.
Is it possible in such cases to detect that the value is invalid before even going into the parse loop? (e.g. because there aren't that many sectors available?). (my knowledge of the format is too basic here to be sure of the best way to validate)

@michaelp sorry for delay in responses but I'm currently really busy with main work activities. I'm really interested in your points to improve performance and architectur, especially the last one.. Do you think that a separate branch could work to explore some ideas?
@Numpsy , your idea could be useful but would only work for local header corruption and not for sector chain corruption where you can't predict block linking without traversing linked list: this is the more subtle type of corruption and probably the hardest to detect without a lot of allocation in a lookup data structure (=> I'm really interested in a Floyd algorithm implementation as suggested by @michaelp)

@ironfede I am writing some tests and I need a few more evening to come with a solution to the current task. (Just keeping you in the loop). I want to deliver what I've started and then we can talk.

I do realize it has been 5 years, but did anything ever come out of this?
We are currently looking into a performance issue when creating lots of Storages/Streams, and it seems to point to the EnsureUniqueSectorIndex function.

I have used this in the OpenMcdf.PerfTest project:


class Program
{
    static int MAX_STORAGE_COUNT = 500;
    static int MAX_STREAM_COUNT = 50;
    static String fileName = "PerfLoad.cfs";

    static void Main(string[] args)
    {
        var stopWatch = Stopwatch.StartNew();
        File.Delete(fileName);
        if (!File.Exists(fileName))
        {
            CreateFile(fileName);
        }

        Console.WriteLine($"Create took {stopWatch.Elapsed}");

        CompoundFile cf = new CompoundFile(fileName);
        stopWatch.Restart();
        CFStream s = cf.RootStorage.GetStream("Test1");
        Console.WriteLine($"Read took {stopWatch.Elapsed}");
        Console.Read();
    }

    private static void CreateFile(String fn)
    {
        CompoundFile cf = new CompoundFile();

        for (int j = 0; j < MAX_STREAM_COUNT; j++)
        {
            cf.RootStorage.AddStream("Test" + j.ToString()).SetData(Helpers.GetBuffer(300));
        }

        for (int i = 0; i < MAX_STORAGE_COUNT; i++)
        {
            var storage = cf.RootStorage.AddStorage("Storage" + i.ToString());

            for (int j = 0; j < MAX_STREAM_COUNT; j++)
            {
                storage.AddStream("Test" + j.ToString()).SetData(Helpers.GetBuffer(300));
            }
        }

        cf.Save(fileName);
        cf.Close();
    }
}

Would you accept a PR for an additional Configuration flag in the CFSConfiguration enum that disables the Unique Sector Index check?

I have done some improvements that help a tiny bit at least...
#132 and #133

My test on my machine is like this, while having validation enabled (properly) or disabled:

Create with validation took 00:00:39.3890968
Create without validation took 00:00:30.0826849

That's a nice 30% improvement

I wanted to have a look at the earlier suggested algorithm for detecing loops at some point, but then fell over a number of bugs that needed to be fixed first, and never got to it :-(