Memory<T> and large memory mapped files
hexawyz opened this issue · 27 comments
I'm currently experimenting with OwnedMemory<T>
and Memory<T>
in an existing project that I'm trying to improve, and I ran into an issue with OwnedMemory<T>
and Memory<T>
being limited to int.MaxValue
.
Scenario
I have a relatively big (> 2GB) data file that I want to fully map in memory (i.e. a database). My API exposes methods that returns subsets of this big memory mapped file, e.g.
public ReadOnlyMemory<byte> GetBytes(int something)
{
// …
return mainMemory.Slice(start, length).AsReadOnly();
}
Wrapping the MemoryMappedFile
and associated MemoryMappedViewAccessor
into an OwnedMemory<byte>
seemed to be a good idea, since most of the tricky logic would then be handled by the framework.
Problem
The memory block that I want to wrap is bigger than 2GB and cannot currently be represented by a single Memory instance.
Since Memory can only work with T[]
, string
, or OwnedMemory<T>
, it seems that having to give up on the straightfoward OwnedMemory<T>
implementation also means that I have to give up on using Memory<T>
at all.
(In this specific case, Span<T>
being limited to 2GB, would not be a problem, because the sliced memory blocks that my API would return would always be much smaller than that.)
Possible solutions with the currently proposed API
- Not using
Memory<T>
at all and implementing a much simplified version ofOwnedMemory<T>
/Memory<T>
that would fit my use case - Keeping many overlapping instances of
OwnedMemory<T>
around and use the one that best fits the current case
Question
Would it be possible to improve the framework in order to be able of easily working with such large memory blocks? (Maybe implementing something like a BigMemory<T>
?)
We will be soon adding ReadOnlyBuffer. See https://github.com/dotnet/corefxlab/blob/master/src/System.Buffers.Primitives/System/Buffers/ReadOnlyBuffer.cs
We would be interested in your feedback on this type. Would it support your scenarios?
I took some time to look into this new type and I think I could make it work (haven't had the time to try it yet, though).
I quite like the idea of having a standardized buffer type, but I am a bit afraid about the induced complexity in a case where all memory is contiguous by design. (Especially in the case of the Seek operation)
In my current case, the file is approximately 3,5GB, so I could create 4 OwnedMemory<byte>
of 1GB or less, backed up by their owner, and I would have to chain those block by implementing IMemoryList<byte>
on them.
If I'm not mistaken, using ReadOnlyBuffer<byte>
would mean that creating a Span<byte>
for a small part of the buffer, instead of being an O(1) operation such as new Span<byte>(pointer + offset, length)
, would be a non-trivial O(log N) operation.
As soon as I have the time, I'll try creating a small benchmark for this use case, and compare possible implementations.
@pakrym, @davidfowl I think we could solve the O(log N) seek problem if IMemoryList<T>
extended ISequence<T>
. ISequence<T>
has Seek and it could be implemented as O(1) on some specialized datastructures, e.g. and array of buffers of the same size.
N is the number of segments. So I don't see how this has a big impact if buffers are large.
As I said before I don't like two sources of Positions (ROB and IML)
If IMemoryList extends ISequence, there would not be two sources of position. There would only be APIs on ISequence (Start, TryGet, Seek)
What about ReadOnlyBuffer? It edits Index to put bit's into it, how would it know that IMemoryList
does not rely on that bit? It's the same conversion as in previous IMemoryList redesign
I created a benchmark comparing approaches for accessing a large memory block:
https://github.com/GoldenCrystal/MemoryLookupBenchmark
I tried to get it as close as possible to my real use-case:
- Find the index and length of the data (I cheated a bit by using constant-length items there)
- Create a reference to that data for later use (e.g.
Span<T>
) - Copy the item to a buffer (e.g. for Sockets)
Assuming I didn't make any mistakes in the benchmark code, the numbers tell me that using ReadOnlyBuffer would be ~1.95 times slower than implementing a custom slice type:
BenchmarkDotNet=v0.10.12, OS=Windows 10 Redstone 3 [1709, Fall Creators Update] (10.0.16299.192)
Intel Core i7-4578U CPU 3.00GHz (Haswell), 1 CPU, 4 logical cores and 2 physical cores
Frequency=2929690 Hz, Resolution=341.3330 ns, Timer=TSC
.NET Core SDK=2.1.4
[Host] : .NET Core 2.0.5 (Framework 4.6.26020.03), 64bit RyuJIT
DefaultJob : .NET Core 2.0.5 (Framework 4.6.26020.03), 64bit RyuJIT
Method | Mean | Error | StdDev | Scaled | ScaledSD |
---|---|---|---|---|---|
'Copy a random item to the stack using a locally generated Span.' | 160.455 ns | 1.7740 ns | 1.6594 ns | 1.00 | 0.00 |
'Copy a random item to the stack using the custom implemented SafeBufferSlice<T> struct.' | 168.540 ns | 3.3838 ns | 4.5172 ns | 1.05 | 0.03 |
'Copy a random item to the stack using the ReadOnlyBuffer<T> struct.' | 329.546 ns | 3.3078 ns | 3.0941 ns | 2.05 | 0.03 |
I'm not sure how much implementing ISequence<T>
would improve the performance there. I tend to think it would be difficult to match the performance reached by the more direct uses of (ReadOnly)Span<T>
. 🤔
FYI: We are adding IMemoryList.GetPosition(long). It will enable O(1) random access on some IMemoryList implementations (implementations with uniform size segments).
cc: @pakrym
Using PR dotnet/corefx#27499
Method | Mean | Op/s | Scaled |
-------------------------------------------------------- |-----------:|------------:|-------:|
'MM item. Local Span' | 148.554 ns | 6,731,567.6 | 1.00 |
'MM item. BufferSlice<T>' | 154.868 ns | 6,457,113.1 | 1.04 |
'MM item. ReadOnlySequence<T> (current)' | 272.563 ns | 3,668,870.8 | 1.84 |
'MM item. ReadOnlySequence<T> (PR dotnet/corefx#27455)' | 254.244 ns | 3,933,232.7 | 1.71 |
'MM item. ReadOnlySequence<T> (PR dotnet/corefx#27499)' | 211.564 ns | 4,726,706.1 | 1.43 |
Improved to x1.43 off the local span. Code changes to benchmark to test hexawyz/MemoryLookupBenchmark#1
Bear in mind that SafeBufferSlice
works directly off a pointer to create its Span
so it wouldn't be able to be contained in the ReadOnlySequence
data structure or return a ReadOnlyMemory
as it doesn't use OwnedMemory
, isn't an array or string
.
Also ReadOnlySequence
does bounds checking on Slice
which the SafeBufferSlice
doesn't do, it just adds the offset
to the pointer and returns a Span
of length
- so its pretty unsafe.
*edit updated with tweaks
Update to benchmarks PR dotnet/corefx#27499 is doesn't scale badly for 100-1000 segments as shown below
Method | Categories | Mean | Op/s | Scaled |
-------------------------------- |-------------- |------------:|-------------:|-------:|
'ReadOnlySequence<T> (current)' | 1 segment | 103.83 ns | 9,630,807.9 | 1.00 |
(PR dotnet/corefx#27455)' | 1 segment | 85.50 ns | 11,696,574.0 | 0.82 |
(PR dotnet/corefx#27499)' | 1 segment | 74.30 ns | 13,458,594.1 | 0.72 |
| | | | |
'ReadOnlySequence<T> (current)' | 100 segments | 1,293.73 ns | 772,961.6 | 1.00 |
(PR dotnet/corefx#27455)' | 100 segments | 969.20 ns | 1,031,774.4 | 0.75 |
(PR dotnet/corefx#27499)' | 100 segments | 248.77 ns | 4,019,825.1 | 0.19 |
| | | | |
'ReadOnlySequence<T> (current)' | 1000 segments | 1,375.86 ns | 726,820.1 | 1.00 |
(PR dotnet/corefx#27455)' | 1000 segments | 1,026.54 ns | 974,149.4 | 0.75 |
(PR dotnet/corefx#27499)' | 1000 segments | 286.20 ns | 3,494,079.8 | 0.21 |
| | | | |
Span<T> | MM item | 147.97 ns | 6,758,249.9 | 0.54 |
BufferSlice<T> | MM item | 152.01 ns | 6,578,374.7 | 0.56 |
'ReadOnlySequence<T> (current)' | MM item | 273.28 ns | 3,659,196.5 | 1.00 |
(PR dotnet/corefx#27455)' | MM item | 252.47 ns | 3,960,792.4 | 0.92 |
(PR dotnet/corefx#27499)' | MM item | 211.79 ns | 4,721,555.1 | 0.78 |
Also ReadOnlySequence does bounds checking on Slice which the SafeBufferSlice doesn't do, it just adds the offset to the pointer and returns a Span of length - so its pretty unsafe.
You're right about that… I just tried adding bounds checking before the creation of BufferSlice<T>
to have a more fair comparison, and at least on my machine, it seems to actually increase the throughput 🤨
Method | Mean | Error | StdDev | Op/s | Scaled | Allocated |
---|---|---|---|---|---|---|
Span<T> | 161.9 ns | 1.951 ns | 2.921 ns | 6,178,403.9 | 0.52 | 0 B |
BufferSlice<T> | 151.8 ns | 2.123 ns | 3.178 ns | 6,589,287.8 | 0.49 | 0 B |
'BufferSlice<T> no Bounds Checking' | 166.2 ns | 1.419 ns | 2.124 ns | 6,015,079.6 | 0.54 | 0 B |
'ReadOnlySequence<T> (current)' | 310.0 ns | 1.916 ns | 2.868 ns | 3,226,296.4 | 1.00 | 0 B |
I may have made a mistake somewhere, or maybe it simply plays well with the JIT inlining, but I don't know what to conclude.
Anyway, good job with the improvements. The new results are great 🙂
Latest in dotnet/corefx#27499 is much closer still
Span<T> | MM item | 145.45 ns | 6,875,297.6 | 0.55 |
BufferSlice<T> | MM item | 147.68 ns | 6,771,233.9 | 0.55 |
ReadOnlySequence<T> (previous) | MM item | 266.73 ns | 3,749,147.7 | 1.00 |
ReadOnlySequence<T> (current) | MM item | 246.94 ns | 4,049,523.6 | 0.93 |
ReadOnlySequence<T> (this PR) | MM item | 198.30 ns | 5,042,838.1 | 0.74 |
Nice! These results are so close that I doubt the differences will matter outside of microsbenchmarks, i.e. once the program starts doing something interesting with the data in the buffers.
I am going to close this. If there is data showing that ROS still cannot support real apps with multi-segmented buffers, we can think how to improve the perf further. @GoldenCrystal thanks for bringing this scenario to our attention.
Copying conversation over from https://github.com/dotnet/coreclr/issues/5851#issuecomment-370276484
From @kstewart83:
What is the possibility of adding a
Span
/Memory
constructor for working with memory mapped files? Currently, it looks like I have to have unsafe code in order to do this:
var dbPath = "test.txt";
var initialSize = 1024;
var mmf = MemoryMappedFile.CreateFromFile(dbPath);
var mma = mmf.CreateViewAccessor(0, initialSize).SafeMemoryMappedViewHandle;
Span<byte> bytes;
unsafe
{
byte* ptrMemMap = (byte*)0;
mma.AcquirePointer(ref ptrMemMap);
bytes = new Span<byte>(ptrMemMap, (int)mma.ByteLength);
}
Also, it seems like I can only create
Span
s, as there aren't public constructors forMemory
that take a pointer (maybe I'm missing the reason for this). But since the view accessors have safe memory handles that implementSystem.Runtime.InteropServices.SafeBuffer
(i.e., they have a pointer and a length)...it seems natural to be able to leverage this forSpan
/Memory
. So what would be nice is something like this:
var dbPath = "test.txt";
var initialSize = 1024;
var mmf = MemoryMappedFile.CreateFromFile(dbPath);
var mma = mmf.CreateViewAccessor(0, initialSize).SafeMemoryMappedViewHandle;
var mem = new Memory(mma);
var span = mem.Span.Slice(0, 512);
I also noticed that the indexer and internal length of
Span
usesint
. With memory mapped files (especially for database scenarios) it is reasonable that the target file will exceed the upper limit forint
. I'm not sure about the performance impact oflong
based indexing or if there is some magic way to have it both ways, but it would be convenient for certain scenarios.
From @kstewart83:
Unfortunately, looking at https://github.com/dotnet/corefx/issues/26603 along with the referenced code in the benchmarks didn't clear things up for me. It seems like that particular use case is geared to copying small bits of the memory mapped files into
Span
s andReadOnlySegment
s. It looks like the solution still involves unsafe code withOwnedMemory<T>
, which is what I'd like to avoid. I don't have experience with manual memory management in C#, so some of this is a little difficult to grasp. That's what I found appealing aboutSpan
/Memory
is that I could now access additional performance and reduce/eliminate copying data around without the headache of manual memory management and the issues that come with it. It seems memory mapped files fit into target paradigm ofSpan
/Memory
(unifying the APIs around contiguous random access memory), so hopefully some type of integration of memory mapped files andSpan
/Memory
makes it in at some point.
From @davidfowl:
@KrzysztofCwalina I think we should create something first class with Memory mapped files and the new buffer primitives (ReadOnlySequence).
@kstewart83 all we have right now are extremely low level primitives that you have to string together to make something work. That specific issue was about the performance gap between using Span directly and using the ReadOnlySequence (the gap has been reduced for that specific scenario).
Dealing with anything bigger than an int you'll need to use
ReadOnlySequence<T>
which is just a view over a linked list ofReadOnlyMemory<T>
.
It is not generally possible to slice large files into 1GB span segments. For example, a file could contain a large stream of small serialized items. Then, it's not possible to know where to cut the file. Slicing it could lead to torn items.
So it's no longer possible to create a span and pass it to some API of the form IEnumerable<MyItem> DeserializeStream(Span<byte> span)
because the caller cannot know the slicing boundaries.
It would be really good if span supported long
length. Some .NET users are already bumping against the 2GB array size limitations. For that reason the limit was increased to 2G items but that's only a short term remedy. As main memory sizes continue to grow any 2GB limit will make .NET look like ancient technology.
But I assume the int
span length was consciously chosen... Unfortunately, I did not readily find a discussion about that but I would be interested to read it if somebody has a url to it at hand.
Wouldn't it be better for the API to be built to handle chunks and therefore work with streaming scenarios as well?
But I assume the
int
span length was consciously chosen... Unfortunately, I did not readily find a discussion about that but I would be interested to read it if somebody has a url to it at hand.
If I understand correctly, the problem here would be more with Memory<T>
than with Span<T>
:
The current version of Memory<T>
packs nicely into 16 bytes on x64, while Span<T>
seems to have room for replacing the int _length
by IntPtr _length
and still fitting into 8/16 bytes.
However, increasing the Lenght
property of Span<T>
requires doing the same with Memory<T>
.
If I'm not mistaken, increasing the size of Memory<T>
(from 16 bytes to 24 bytes) might have consequences on the performance of the code, which would impact everyone. (Not just those of us that are playing with large regions of memory)
It is true that in the case I presented, ReadOnlySequence<T>
acts as a valid replacement for a 64 bits-enabled Memory<T>
/ Span<T>
, because all I needed was to copy the data somewhere.
But when you need to read/decode without copying, the API might indeed be less straightforward. 🤔
I suspect though that since Memory<T>
is allocated on the heap, the performance impacts would be different than say for Span<T>
. Passing a Memory<T>
object around shouldn't be any different, so I think the only performance impact would be in creating Span<T>
s or maybe the fill routines?
A compelling use case I see with combining memory mapped files with Memory<T>
/Span<T>
is specifically to enable zero copy databases with only safe C#. It allows for a very understandable and uniform API by being able to present ReadOnly slices as well as ReadWrite slices. This could be combined with data formats such as FlatBuffers which don't require explicit parsing/unpacking to access the data.
Memory is not allocated on the heap (necessarily). It's a struct.
@KrzysztofCwalina, there is no API proposal for MMF Memory/Span overloads, should this issue be converted to api-needs-work
. It will help downstream projects (serializers and other data computers etc,) waiting to update to .NET Core 2.1, if MMF also join the Span(t) and Memory(t) club. Thanks!
@kasper3, please open a separate issue for adding span support to MMF. This issue was about Memory's length property not being able to deal with large files.
@kasper3 @KrzysztofCwalina is there a separate issue for MMF/Span? I was not able to find it and is not linked here.
I am not aware.
@attilah, related https://github.com/dotnet/corefx/issues/29562#issuecomment-388182098 and overarching idea https://github.com/dotnet/corefx/issues/30174.
In case of MemoryMappedFile.CreateFromMemory
, the file IO operation due to every .WriteX(..)
would need to be replaced by memory IO operation. Use-case i was thinking was; user downloaded data file and without persisting to filesystem, file can be mapped to memory and sent back on wire. If you have better ideas how the API should be structured in terms of competing/related proposals, please send a proposal.
Sorry to be late to this, but it is not very clear to me from the above what is currently the recommended way to turn a MemoryMappedFile
into a ReadOnlySequence<byte>
(or ReadOnlySpan<byte>
)?
@miloush , you can use third-party library. ReadOnlySequenceAccessor is probably what you need.