datopian/ckanext-blob-storage

[upload] browser crashes while trying to upload large file (~600Mb)

mbeilin opened this issue · 6 comments

Browser crashes while trying to upload large CSV file (~600Mb) for the resource in NHS Staging (https://ckan.nhs.staging.datopian.com/)

Please see the screencast

Acceptance

  • The large files (currently up to ~6Gb) are uploaded directly through the GUI portal.

Analysis

@shevron as per our discussion previously today, and from I checked after, it looks like the problem here is
that we are sending the actual file through Giftless and not directly into the cloud storage (GoogleCloud specifically here).
So this approach is working fine with the small files (up to ~100Mb), but not with the big ones.

@Daniellappv @mbeilin i really doubt this is related to anything go through giftless - there is nothing in that system to pass through data. It is much more likely to be a browser issue e.g. the browser struggling to hash the file ...

The site you pointed to (nhs) is configured to upload directly to Google Cloud. Giftless is only used to get the signed URL, and CKAN is not a part of the process at all. This is a client side issue, where something the JS code does (probably either hashing the file or inspecting it for schema?) is causing a browser crash.

Def. a client side issue - notice the "Uncaught (in promise) out of memory" error:

Screenshot from 2020-10-08 20-27-06

(the 1st error seems to be logged but ignored, and most likely unrelated / separate issue).

This when trying to upload a 1gb file on Firefox / Linux locally. It happens before Giftless is even hit.

Suggest moving this issue to datapub / ckan3-js-sdk or at least opening an issue there that references this one. /cc @mariorodeghiero

Just from taking a quick look at the ckan3-js-sdk code I see https://github.com/datopian/ckan-client-js/blob/master/lib/file.js#L56 is loading the entire file contents to memory. I don't know if this is where we crash but it must be something like this. A big no-no if we need to support large files.

Yes, probably the root cause is because of the hash, but the SDK is using data.js now https://github.com/datopian/data.js/blob/master/src/index.js#L191

The hash issue should be fixed in this PR frictionlessdata/frictionless-js#90 in data.js

Note: I'm not sure if the rest of the data.js methods like file.rows() and file.addSchema() will work for large files. So if you got a different error please use the data.js repository.

I am closing the hash issue here @mbeilin if you got an error about generate hash please use this frictionlessdata/frictionless-js#89

cc: @shevron