mapbox/tilelive

Progress reports are not honest

rclark opened this issue · 3 comments

During tilelive.copy operations, the progress reporting is telling us the status of reads from the source. This may happen much faster than writes to the destination occur. In the log shown, you can see that the copy reports 100% 2 minutes before actually completing all the tile writes.

[  1s] 100.0000%   1.7k/  1.7k @ 1.4k/s | ✓ 1.7k □ 15.8k | 0s left
[Fri, 12 Dec 2014 19:49:56 GMT] progress reports 100% complete
[Fri, 12 Dec 2014 19:49:56 GMT] put stream puts a tile
[Fri, 12 Dec 2014 19:49:56 GMT] put stream puts a tile
[Fri, 12 Dec 2014 19:49:56 GMT] put stream puts a tile
[Fri, 12 Dec 2014 19:49:56 GMT] put stream puts a tile
...
[Fri, 12 Dec 2014 19:51:07 GMT] put stream emits 'stop'
[Fri, 12 Dec 2014 19:51:07 GMT] tilelive.copy exits

cc @GretaCB @willwhite

#108 helps a lot here. Progress reporting still strictly speaking telling us about read speeds, but the addition of a highWaterMark on the write streams means that the node.js stream API is going to do a much better job of letting slow writes throttle fast reads. Qualitatively, it looks as though it'll take ~2000 tile copy operations or so before read and write speeds are more or less equivalent.

much better job of letting slow writes throttle fast reads

Dumb question: Would it make sense / be viable to have the output display both read and write speed separately? If slow writes are throttling reads how would this be evident?

@springmeyer right now we measure progress as the speed at which information moves from the readable stream to the writable stream. Basically:

readable.pipe(measureSpeed).pipe(writable);

In order to actually track write speed, we would need to either rely on "private" events and properties of the writable stream (or make them public) plus write a custom progress reporting function, or else convert the writable stream to a transform stream so that we can reuse the progress reporter we have like:

readable.pipe(readSpeed).pipe(writable).pipe(writeSpeed);

While building #108 I wrote a script that gives you a visual sense of how the node.js stream api keeps read/write speeds roughly equal (by stopping reads when there's too much waiting to still be written). I may try and build this out into some kind of walkthrough -- its actually a nice way to have a look at how the stream API works. Also hit me up in chat if you want me to show you sometime.