Getting out of memory exception for large files
avnet78 opened this issue · 2 comments
Here's a little background about what I am trying to do:
I get an encrypted file. I use a streamreader to read each line and decrypt it.
Each line of encrypted data has 35000 lines of csv data after decryption.
The size of the 35000 lines of csv data is about 5 MB.
After decrypting all lines in the encrypted file, the size of the data is about 2 GB.
I need to do this in-memory, cannot save/write the encrypted data to a file on disk.
After a few line of data is decrypted and added to the stream, out of memory exception is being thrown.
Please help!
Here's my code:
private RecyclableMemoryStreamManager GetMemoryStreamManager()
{
int blockSize = 1024 * 5;
int largeBufferMultiple = 1024 * 1024;
int maxBufferSize = 16 * largeBufferMultiple;
var manager = new RecyclableMemoryStreamManager(blockSize,
largeBufferMultiple,
maxBufferSize);
manager.GenerateCallStacks = true;
manager.AggressiveBufferReturn = true;
manager.MaximumFreeLargePoolBytes = maxBufferSize * 4;
manager.MaximumFreeSmallPoolBytes = 100 * blockSize;
return manager;
}
public void ProcessStream(string filePath)
{
var memoryStreamManager = GetMemoryStreamManager();
var memoryStream = memoryStreamManager.GetStream();
StreamWriter dataStream = new StreamWriter(memoryStream, Encoding.UTF8)
{
AutoFlush = true
};
using (StreamReader reader = new StreamReader(filePath))
{
while (reader.Peek() > 0)
{
var data = DecryptString(reader.ReadLine());
//This line of code is throwing OutOfMemory Exception after writing a few lines of decrypted data to the stream
dataStream.WriteLine(data);
}
}
// do something with the memory stream after all lines of data is decrypted
}
@benmwatson Can you please help me with this?
I attempted to repro your issue locally with the following sample code:
using System.Text;
using Microsoft.IO;
using Newtonsoft.Json;
var memoryStreamManager = GetMemoryStreamManager();
var memoryStream = memoryStreamManager.GetStream();
StreamWriter dataStream = new StreamWriter(memoryStream, Encoding.UTF8)
{
AutoFlush = true
};
long written = 0;
const int lineLength = 1024 *1024 *5;
const string chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789";
Random random = new Random();
while(written < lineLength * 100)
{
dataStream.WriteLine(new string(Enumerable.Repeat(chars, lineLength).Select(s => s[random.Next(s.Length)]).ToArray()));
written += lineLength;
}
memoryStreamManager.Dump();
RecyclableMemoryStreamManager GetMemoryStreamManager()
{
int blockSize = 1024 * 5;
int largeBufferMultiple = 1024 * 1024;
int maxBufferSize = 16 * largeBufferMultiple;
var manager = new RecyclableMemoryStreamManager(blockSize,
largeBufferMultiple,
maxBufferSize);
manager.GenerateCallStacks = true;
manager.AggressiveBufferReturn = true;
manager.MaximumFreeLargePoolBytes = maxBufferSize * 4;
manager.MaximumFreeSmallPoolBytes = 100 * blockSize;
return manager;
}
public static class Dumper
{
public static string ToPrettyString(this object value)
{
return JsonConvert.SerializeObject(value, Formatting.Indented);
}
public static T Dump<T>(this T value)
{
Console.WriteLine(value.ToPrettyString());
return value;
}
}
I had no issue writing out 500MB of data to the stream. Given WriteLine allocates in small chunks (as can be seen by the SmallPoolInUseSize value vs LargePoolInUseSize) you may want to pre-allocate if you can, but I had no issue with OOM. Preallocating would cause you to use unpooled buffers as your max is set to 16MB
{
"BlockSize": 5120,
"LargeBufferMultiple": 1048576,
"UseMultipleLargeBuffer": true,
"UseExponentialLargeBuffer": false,
"MaximumBufferSize": 16777216,
"SmallPoolFreeSize": 0,
"SmallPoolInUseSize": 524293120,
"LargePoolFreeSize": 0,
"LargePoolInUseSize": 0,
"SmallBlocksFree": 0,
"LargeBuffersFree": 0,
"MaximumFreeSmallPoolBytes": 512000,
"MaximumFreeLargePoolBytes": 67108864,
"MaximumStreamCapacity": 0,
"GenerateCallStacks": true,
"AggressiveBufferReturn": true,
"ThrowExceptionOnToArray": false
}
Thank you @cryolithic, it is working now!