Many range requests performance issue
Najji opened this issue ยท 6 comments
Hello!
I have noticed there are performance issues when using range requests with many entries as the library makes twice as many calls to the server as there are entries. I was wondering if this could be solved if the central directory was parsed to get all entry metadata in 1 go rather than many range requests, however, when fetching individual entry data range requests are utilised? This should reduce the number of small range requests that are made. I'm not certain if I have misunderstood something, please let me know if that is the case. Otherwise, I'm more than happy to implement this if this seems like a good approach
Thanks,
Najji
That sounds like a pretty big change. I'm trying to imagine what the new API would look like.
IIRC there are 2 requests per file, one for the header of the file, and another for the content
I'm wondering if there is a what to do it from outside like (pseudo code)
class UnboundPromise {
constructor() {
this.promise = new Promise((resolve, reject) => {
this.resolve = resolve;
this.reject = reject;
});
}
};
class HTTPMultiPartRequestReader extends HTTPRangeReader {
constructor(url) {
super(url);
// save off the existing single range reader
this.singleRequestRead = this.read;
}
async readMultipleEntriesAsArrayBuffers(arrayOfEntries) {
this.read = this.multiRequestRead;
this.numExpectedRequests = arrayOfEntries.length;
this.requests = [];
const promises = arrayOfEntries.map(e => e.arrayBuffer());
await Promise.all(promises);
this.read = singleRequestRead;
}
multiRequestRead(offset, length) {
const unboundPromise = new UnboundPromise();
this.requests.push({ offset, length, unboundPromise });
if (this.request.length === this.numExpectedRequest) {
const requests = this.requests;
this.request = [];
// build a multi-part range request from the entries in requests
// when the data is read, call request[id].unboundPromise.resolve(dataForThatRequest)
}
return unboundPromise.promise;
}
Which you'd use like this
async main() {
const reader = new HTTPMultiPartRequestReader('some/url');
const {entries} = await unzip(reader);
const arrayOfArrayBuffers = async reader.readMultipleEntriesAsArrayBuffers([
entries['foo.png'],
entries['bar.jpg'],
entries['data.bin'],
]);
I can see the above is fragile in that it assumes there will be 2 calls per entry and it assumes no one will ask for any other entries until this set of entries has finished. My point isn't to suggest this as a solution so much as to think about the issues.
For example if the code reading a particular entry cached a copy to the read functions as in
const readFn = reader.read.bind(reader);
For each entry read request then the above code would be safe-ish from other requests and would work with no changes to the API.
I'm mostly just thinking out loud. I don't like the idea that reading one entries has to care about any other entry.
Hi!
I think there may have been a misunderstanding, what I was referring to more specifically was this code:
async function readEntries(reader, centralDirectoryOffset, entryCount, comment, commentBytes) {
let readEntryCursor = centralDirectoryOffset;
const entries = [];
for (let e = 0; e < entryCount; ++e) {
const buffer = await readAs(reader, readEntryCursor, 46);
Here for each entry it makes a range request. Ideally, we only make requests beforehand to fetch the central directory (can be 1 request or multiple range requests for larger central directories), and pass it into this function readEntries
I hope that makes sense.
Yea, I can see how that's an issue but I don't think I see how it can be fixed by multi-part-http requests. Since each entry of variable size you can't know how to read the next entry without reading the current entry so you can't make a multi-part request since you have idea what the next part is.
It seems like the solution would change that code is it reads the entire central directory in one request. I think I can make that change.
Ok. I pushed v1.2.0 that should be less requests.
Oh, I guess I got distracted by the original issue title ๐ I see now this is what you were suggesting in the first place. Fortunately it was a pretty small change. I should have done that in the first place.
Ha! The title did have a few iterations ๐. Thank you very much for this change - it's lightning quick now. Amazing job ๐