Provide sync variant of OutputFileStream.close()
Opened this issue · 6 comments
Currently the OutputFileStream.close()
method is async, awaiting on the async RandomAccessFile.close()
method.
However there is a RandomAccessFile.closeSync()
variant that allows to have OutputFileStream.closeSync()
variant to not break the current API.
Or maybe just make the current close()
method not asynchronous. Especially given that all other operations on the RandomAccessFile
are the sync ones.}
This will also allow to make the ArchiveFile.close()
method non async (currently the only async method there) or have the ArchiveFile.closeSync()
variant.
InputFileStream has closeSync, so I'll add that to OutputFileStream. Not sure why it doesn't have it already, just neglect. I agree the whole API is a mess with sync / async. One of these days I'll do a big 4.0 update and clean up the mess. Probably won't happen for a while.
Thank you. If you could add ArchiveFile.closeSync()
that would be great too...
On a side note, I'm somewhat puzzled what is the most performant way to zip bunch of files on a local file system (i.e. dart:io). It looks like, using InputFileStream
with ArchiveFile
seems slower than File.readAsBytesSync()
and then pass the bytes to ArchiveFile
. Similarly, the ZipEncoder
seems to be faster with in-memory OutputStream
and then File.writeAsByteSync()
vs OutputFileStream
.
Additionally, it is unclear when to close that InputFileStream
passed to ArchiveFile
that was added to a ZipEncoder
. Maybe the ZipEncoder.addFile()
could use something like optional autoclose
flag to release these opened files. Currently it is really easy to get "too many open files" error with those files unclosed.
File streaming will always be slower than just reading all the bytes into memory. File IO is particularly slow, especially with Dart. The problem is that big files take lots of memory, too much memory a lot of times. So it is a game to play; do you read the entire file into memory, or do you stream it?
I'll take a look at an autoclose option.
I pushed to git adding closeSync to OutputFileStream and ArchiveFile. And I added autoClose named arg to ZipEncoder encode and addFile methods.
File streaming will always be slower than just reading all the bytes into memory. File IO is particularly slow, especially with Dart. The problem is that big files take lots of memory, too much memory a lot of times. So it is a game to play; do you read the entire file into memory, or do you stream it?
I know the IOSink
is slow and there are several stale issues in the dart-lang about it, e.g. dart-lang/sdk#32874 or dart-lang/sdk#17951
However, if you are in the dart:io
namespace, you could use File.openSync(mode: FileMode.write)
that returns a RandomAccessFile
and then use RandomAccessFile
methods writeFromSync()
, readSync()
and readIntoSync()
instead of using IOSink/Stream
from the File.openRead/openWrite()
methods.
In my tests, writes with RandomAccessFile
are over 10x faster compared to IOSink (that is on OSX MBP with SSD).
Another approach is to buffer IOSink. I get significant improvement with many small writes. Though it is about 2..3x slower then RandomAccessFile
using 4..8mb buffer.
Here is my naive (not fully tested) BufferedSink
POC:
class BufferedSink implements IOSink {
BufferedSink(this._sink, [this._bufferSize = 4 * 1024 * 1024])
: encoding = _sink.encoding,
_buffer = Uint8ClampedList(_bufferSize);
final IOSink _sink;
final int _bufferSize;
Uint8ClampedList _buffer;
int _offset = 0;
@override
Encoding encoding;
@override
void add(List<int> data) => _addInternal(data);
@override
void addError(Object error, [StackTrace? stackTrace]) => _sink.addError(error, stackTrace);
@override
Future addStream(Stream<List<int>> stream) => stream.listen(add, onError: addError, cancelOnError: true).asFuture();
@override
Future get done => _sink.done;
@override
Future flush() {
_flushInternal();
return _sink.flush();
}
@override
Future close() {
_flushInternal();
return _sink.close();
}
@override
void write(Object? object) => add(utf8.encode(object.toString()));
@override
void writeAll(Iterable objects, [String separator = '']) {
var isFirst = true;
for (var object in objects) {
if (isFirst) {
isFirst = false;
} else {
write(separator);
}
write(object);
}
}
@override
void writeCharCode(int charCode) => write(String.fromCharCode(charCode));
@override
void writeln([Object? obj = '']) {
if (obj != '') {
write(obj);
}
write('\n');
}
void _addInternal(List<int> data) {
if(data.length > _bufferSize) {
if(_offset > 0) {
_sink.add(_buffer.sublist(0, _offset) + data);
_buffer = Uint8ClampedList(_bufferSize);
_offset = 0;
} else {
_sink.add(data);
}
} else {
int maxLength = _bufferSize - _offset;
int length = min(data.length, maxLength);
int end = _offset + length;
_buffer.setRange(_offset, end, data.sublist(0, length));
_offset = end;
if (end == _bufferSize) {
_flushInternal();
}
if(data.length > length) {
_buffer.setRange(0, length, data.sublist(length));
_offset = length;
}
}
}
void _flushInternal() {
if(_buffer.isNotEmpty) {
_sink.add(_buffer);
_buffer = Uint8ClampedList(_bufferSize);
_offset = 0;
}
}
}
And the benchmark is like this:
void main() async {
File file = File("foo.bin");
Stopwatch sw = Stopwatch()..start();
// IOSink out = BufferedSink(file.openWrite(), 8 * 1024 * 1024);
RandomAccessFile out = file.openSync(mode: FileMode.write);
for (int i = 0; i < 40000; i++) {
out.add(Uint8List(65536));
}
await out.close();
print("total time: ${sw.elapsedMilliseconds}");
}
extension _RAF on RandomAccessFile {
void add(List<int> data) => writeFromSync(data);
}
The RandomAccessFile is already used with the InputFileStream and OutputFileStream... So, to clarify, the above comment was mostly about current limitations of the existing implementations of InputStreamBase
. I have to either provide data buffer or a file (e.g. to be added to archive).
But it would be nice to have one that takes an IOSink + an in-memory buffer, then consumer could write to it without loading entire content into memory or creating a file.