BOM Gone?
r2d2Proton opened this issue · 1 comments
r2d2Proton commented
The best I can tell, in Lottie-Windows Loader.cs, StorageFIleLoader.cs, LottieCompositionReader.cs there is an effort to process different UTF and non-UTF files:
public static LottieComposition? ReadLottieCompositionFromJsonStream(Stream stream, Options options, out IReadOnlyList<(string Code, string Description)> issues)
{
ReadStreamToUTF8(stream, out var utf8Text);
return ReadLottieCompositionFromJson(utf8Text, options, out issues);
}
static void ReadStreamToUTF8(Stream stream, out ReadOnlySpan<byte> utf8Text)
{
// This buffer size is chosen to be about 50% larger than
// the average file size in our corpus, so most of the time
// we don't need to reallocate and copy.
var buffer = new byte[150000];
var bytesRead = stream.Read(buffer, 0, buffer.Length);
var spaceLeftInBuffer = buffer.Length - bytesRead;
while (spaceLeftInBuffer == 0)
{
// Might be more to read. Expand the buffer.
var newBuffer = new byte[buffer.Length * 2];
spaceLeftInBuffer = buffer.Length;
var totalBytesRead = buffer.Length;
Array.Copy(buffer, 0, newBuffer, 0, totalBytesRead);
buffer = newBuffer;
bytesRead = stream.Read(buffer, totalBytesRead, buffer.Length - totalBytesRead);
spaceLeftInBuffer -= bytesRead;
}
utf8Text = new ReadOnlySpan<byte>(buffer);
NormalizeTextToUTF8(ref utf8Text);
}
static void NormalizeTextToUTF8(ref ReadOnlySpan<byte> text)
{
if (text.Length >= 1)
{
switch (text[0])
{
case 0xEF:
// Possibly start of UTF8 BOM.
if (text.Length >= 3 && text[1] == 0xBB && text[2] == 0xBF)
{
// UTF8 BOM. Step over the UTF8 BOM.
text = text.Slice(3, text.Length - 3);
}
break;
}
}
}
The best I can tell, when loading UTF-8 files with:
var filePicker = new FileOpenPicker{};
StorageFile? file = await filePicker.PickSingleFileAsync();
The BOM has already been eaten by a function before this is called. The beginning of the buffer is the start of the "{"JSON.
Simplified version:
static void ReadStreamToUTF8(Stream stream, out ReadOnlySpan<byte> utf8Text)
{
// This buffer size is chosen to be about 50% larger than the average file size in our corpus, so most of the time
var buffer = new byte[stream.Length];
var bytesRead = stream.Read(buffer, 0, buffer.Length);
utf8Text = new ReadOnlySpan<byte>(buffer);
NormalizeTextToUTF8(ref utf8Text);
}
r2d2Proton commented
Also, please note the Lottie file I am testing with happens to be 1,812,872 bytes. Many others though are less than 100KB. Doing a check of more. . .
And other files in the same folder are larger than the 150KB allocated above (793 KB, 329 KB, 259 KB, 257 KB, . . . , 223 KB).
Another at 2,136,832 bytes