dart-lang/sdk

performance: writing to file using IOSink(add) is very slow

Closed this issue · 19 comments

This issue was originally filed by @tatumizer


Writing to file is almost 10 times slower than in java
  var sw = new Stopwatch()..start();
  
  IOSink out=new File("c:/temp/foo.txt").openWrite();
  out.done.then((v)=>print("total time: ${sw.elapsedMilliseconds}"));
  for (int i=0; i<4000; i++) {
    out.add(new Uint8List(65536));
  }
  out.close();

Takes 1813 ms in dart; same in java completes in 212 ms
The above code writes 256MB. I tested how it might be affected by size - reduced iterations from 4000 to 1000
Took 484ms in dart, 55 ms in java. Same ratio.

Added Area-IO, Triaged labels.

Set owner to @Skabet.
Added Accepted label.

Tatumizer, can you attach the Java program for completeness? That would make it easier to understand the differences and why we appear to be that much slower.

I have a CL that fixes some of this, but not a factor of 8x.

Thanks.

This comment was originally written by @tatumizer


I will send in a couple of hours, it's on my work comp.

This comment was originally written by @tatumizer


Hi Anders,
To double-check, I tested on home comp (not nearly as fast as work comp - generic Dell for mass production)
Java program is
   static void writeFile() throws Exception {
        long start=System.currentTimeMillis();
        FileOutputStream fstr=new FileOutputStream("c:/temp/foo.txt");
        for (int i=0; i<4000; i++) {
            byte[] buf=new byte[65536];
            fstr.write(buf);
        }
        fstr.close();
    System.out.println(System.currentTimeMillis()-start);
    }

The ratio of results is more or less the same :
java: 397 ms
dart: 3258 ms

This comment was originally written by @tatumizer


Maybe it's a windows-only phenomenon?

This comment was originally written by @tatumizer


for IO requests, all the "action" occurs in the driver inside OS. Library just has to call correct OS function, and it should be more or less the same for all languages.

Because "add" in IOSink is anynchronous, there's additional optimization: library doesn't wait - it stores the buffer, and when IO interrupt allows writing, it writes. (At least, in principle it should be implemented like this). It can be faster, not slower, than java's OutputStream.

"asynchronous" functionality exists in java, too - in java.nio package.

tatumizer, I was able to run the program, with the following results (with my fix that was just landed):

File does not exist:

Java(Sync): 321ms
Dart(Sync): 264ms
Dart(Async): 1282ms

File exists:

Java(Sync): 1049ms
Dart(Sync): 933ms
Dart(Async): 2048ms

So, it's clear that async writing is slower. This is due to two things. Doing the copy for the writing isolate, and the extra delay in sending messages between isolates. My fix helper with the former.

But when comparing with sync code, Dart is actually a little bit faster than Java.

Also, there is a HUGE difference in if the file exists or not. Be sure you are testing the same.

This comment was originally written by @tatumizer


The result I got from your program in dart are
Fastest: 4060
Fast : 6251
Slow : 9667
It will be like this in java or any language. Writing N blocks at once is faster than N times writing 1 block. Mezoni, please learn how HW works before insulting.

This comment was originally written by @tatumizer


Anders, sorry, prev. post was directed to mezoni.
Wait a sec, I will try to make sense of your results

This comment was originally written by @tatumizer


Anders,
I can't find ANY difference in java for the case where file exists or doesn't.
Maybe it depends on OS or HW or something, but intuitively, it doesn't make any sense.
When we write to "existing" file, it gets kind-of deleted anyway - so new data in general will be stored in different locations on disk - though even that makes no difference whatsoever.
Are you sure you run the same test? Maybe your file is 4 times bigger in one case?
Anyway, my timing is absolutely the same in java

Interesting, this is on Linux. I'll try out on Windows, when I get a chance.

This comment was originally written by @tatumizer


What affects speed of writing is: fragmentation of disk. And of course the
strategy of block distribution implemented by OS

This comment was originally written by @tatumizer


Anders,
The reason I brought up async operations is that in dart, there's no parity between random access and stream files. You just can't use sync operations on stream files - no such thing.
And in all popular benchmarks, output is written into standard out. I don't know any way to write to standard out using random access files.

Mezoni: my apologies. You are a good guy. Just a bit rude. You have to learn, if not HW, then good manners.

This comment was originally written by @tatumizer


Mezoni: if block size is not multiple of 65536 (e.g. 8192), then java works 1.5-2 times faster on my comp on bigger blocks. It depends on too many factors (fragmentation, OS, block size) - no way to compare "objectively".

This comment was originally written by @tatumizer


Anders: the mystery about speed of writes on Windows can be resolved by this:
http://support.microsoft.com/kb/324805

It caches writes by default!
The speed of writes on Windows was a bit fishy to me to begin with. Physically, it can't do it as fast as benchmarks show. SCSI, of course, is faster, but still...
It's caching! Read is always cached by default, that's a matter of course. That write is cached, too, is not that obvious - it can lead to loss of data. Article above explains that.
 

Ah, yes, that's quite common. I'm sure it happens on Linux as well. The extra cost probably comes from actual HD activity, where we start out by deleting the existing file.

This comment was originally written by @tatumizer


Anders: turns out, the issue is more complicated. I just tested writing to "nul" device on Windows. It shouldn't depend on any properties of hardware, of whether the file is new or old - there's no file. Data is just discarded.
For the same 256MB of data, java completes in 73 ms, and dart ... 1760 ms!!!
I'm using Dart SDK version 1.3.0-dev.7.12 - not sure your latest fixes are there, but the difference in timing should be explained somehow.

tatumizer, can you clarify what you are comparing. Are you comparing IOSink with synchronous Java?

It's important that we fully understand what async writing means. Doing async writing will do the exact same as synchronous, except it'll copy the data to another isolate (thread) and let that isolate perform the action. Once done, it'll notify the isolate that issued the write. It's very obvious that writing has a higher overhead when async - that can not change (though we are trying to minimize the overhead). However, what it allows is to not block the isolate issuing the write. This is very important for programs where we have many simultaneous operations, e.g. a HTTP server.

If the results you have (73 vs 1760) is gather from the two programs show in this isolate, we are comparing oranges and apples.