WebReflection/sqlite-worker

OOM

javieranton-zz opened this issue · 31 comments

Hi,

I am load testing and have noticed that the following happens when a row containing a single 1.3MB blob is written:

RuntimeError: memory access out of bounds
RuntimeError: table index is out of bounds
    at sql-wasm.wasm:0x36f48
    at sql-wasm.wasm:0x4259e
    at sql-wasm.wasm:0xb8643
    at za (sql-wasm.wasm:0xfa419)
    at e._sqlite3_close_v2 (VM26073 sql-wasm.js:2791)
    at c.export (VM26073 sql-wasm.js:420)
    at index.js:103
    at new Promise (<anonymous>)
    at index.js:102

This is possible on other sqlite3 implementations so I wonder if there is anything that can be done to make it work here as well

Many thanks

Edit: I originally posted 8.4MB but then found it happens also with as little as 1.3MB. I know it doesn't happen with 1MB, somewhere between 1MB and 1.3MB it hits a limit. This was tested on Chrome (Opera)

sounds like a bug for sql.js as here nothing deals with any memory size limitation so there's likely nothing I can do?
https://github.com/sql-js/sql.js

P.S. without a use case to reproduce, without knowing which browser it is, without knowing if it's via worker or main, ... without knowing anything, it's impossible to also debug this, so I'll close as this cannot be worked on my side by any mean.

As I said, the browser is Chrome (Opera). Tested via main. Use case would be as simple as trying to run a write query that writes a string longer than 1MB (1,000,000 characters)

If it's a sql.js problem then don't worry about it

I am working on a workaround that detects these cases and splits the queries into partial UPDATEs. I'll let you know if this works

Use case would be as simple as trying to run a write query that writes a string longer than 1MB (1,000,000 characters)

yeah, but I have little time these days, so any pre-made failing example would surely help ... or better, I won't work on issues that don't provide test cases in general, as I have too many things going on these days.

Happy to have a look through an example though, but I have no hard limit set anywhere in here, so I don't think I can fix it in a ny way.

Sure, no problem. Not trying to waste your time, just thought you might have a pre-made test case with a string. But here is the code that would throw this error:

window.initSQL({name: 'SomeDB'}).then(async ({all, get, query, raw}) => {
    await query`CREATE TABLE IF NOT EXISTS SomeTable (field text)`; 
    let longText = "";
    for(let i = 0; i< 100000; i++)//this will build up a 2MB string
         longText += "20        characters";
    let writeQuery = "INSERT INTO SomeTable (field) VALUES ('"+longText +"')";
    await query([writeQuery]);//this will break
});

thanks, but why do you keep writing SQL injection prone code though?

have you tried this instead? this is how you should write queries with this module

initSQL({name: 'SomeDB'}).then(async ({all, get, query, raw}) => {
    await query`CREATE TABLE IF NOT EXISTS SomeTable (field text)`; 
    const longText = "20        characters".repeat(100000);
    await query`INSERT INTO SomeTable (field) VALUES (${longText})`;
});

I believe there is a limit for queries bigger than 1.5MB, and I am glad it exists, because nobody should ever write queries the way you did in your example. Use the template, which uses the right mechanism for the library to work in a safe way.

This is also a classic case where me producing what you described would've not revealed your issue with your code ... which is why I am not working on anything that hasn't a use case to read before.

Very interesting, and thank you for taking your time to walk me through this.

why do you keep writing SQL injection prone code though

I am not concerned about injection hazards because this is purely a client app and the local DB doesn't contain any sensitive information. If the client decides to mess up his own local DB that is fine. The local DB is only a copy of an online repository that has 0 relation with server side functionality. Even if a client decided to artificially increase his local DB and push that to their cloud copy, that is also fine, as they have to pre-pay for cloud storage

With this knowledge in mind I can now make use of the template to prevent these problems. It was just simpler for me to feed the entire query through as that is the form it comes in from the repository. Thanks again

It’s not a client/server issue, it’s how any database related operation should be performed. There is no circumstance where concatenating strings, to provide values, is the way to go, and for every possible reason, including data type, SQLite injection, driver SQL parsing overhead, you name it.

do not ever do that anywhere, and use the right tool to do operations the right way.

I appreciate the advise. In my use case, I need to process literally any type SQL statement. I have no previous knowledge of what it can be. So building templates for that is not as straight forward, but can be done with a bit of extra work

If queries come with values, it’s trivial to remove at least strings, return a ?, split by ?, use that as template, and spread all strings as interpolations … now, I don’t know what kind of escape you have there, but it’s valid SQLite, looking for ' and skip intermediate '' sequences, if any, before the next ' should work already

If you find the time, would you mind showing an example of how the following query can be converted into a template by the means you described? Sorry to bother you, I will understand if you don't :-)

let query = "INSERT INTO Sometable (field1,field2) VALUES ('Value1','Value2')";

something like this should work:
#9 (comment)

P.S. mind the edit

Thank you ever so much, it works. Do you think I should be doing the same for UPDATEs, DELETEs and SELECTs?

I don’t know what you’re doing. If you are importing a dump once then yes, if you’re planning to execute that each time then I’d ask why … I mean, a one off dump makes sense, but I am pretty sure you don’t have thousands selects, delete, or updates SQL, you can rewrite those as templates and pass dynamic values instead?

Yes, I mostly meant dumps. Just wanted to know your opinion on the safety of using/not using templates for those things (I still don't understand what it means to be "unsafe" in this context. For example, what can the consequences be of not using a template for a SELECT etc)

Thanks

for dumps you can use that helper (and should) for every operation.

about safe, you got bitten already by misusing the SQLite driver, so I suggest you avoid passing plain strings around every time you can, including SELECTS, as these might reach that parsing hard limit for SQL strings already.

--- this is GOOD!
SELECT id FROM book WHERE sentence LIKE ?

--- this SUCKS!
SELECT id FROM book WHERE sentence LIKE '%nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope%'

OK, got it

I will try to tweak your code to make sure the below query doesn't break (I think the hex field needs something special)

Error: near "?": syntax error. Original query: INSERT OR IGNORE INTO HiddenLinks (Id,SourceId,TargetId) VALUES (X'8E66A9A4225852717B8B7955D23FFC8E',X'F498AE602584A3FABE8C1282EB9AF8A5',X'8EFF4DB423AAA432130EAC6DFE2A170F')

Rule of thumbs (last comment from my side):

  • should I always use the template literal way with interpolations within it for values? Yes
  • should I avoid at all costs passing concatenated strings or any SQL that contains strings as value within the statement? Also Yes.

Like I've said, there is not a single use case for not using ? when values are meant ... there's no maybe, there's no statement, if you have a value, the value should never be there, a ? should be there instead.

I think the hex field needs something special)

I've no idea what's the X field there but it's clearly not part of my helper, as I wrote that in a minute and it surely doesn't cover the world ... you can improve that though 👋

It's a GUID field.. part of the original query. Sorry for having taken so much of your time, have a good one

P.S. it is also very possible with X use case you need to ignore it as the ? won't work there.

this might do the trick:
#9 (comment)

A modified version of your code that handles both single and double quotes, including the case where they live inside each other (i.e. '"..."' or "'...'")

#9 (comment)

AFAIK SQLite deals/prefers single quotes so I am not sure why the double quote is needed ... but glad you solved.

Have a nice weekend 👋

P.S. very repetitive ... I am sure it could be simplified somehow through a callback where you pass either ' or " as char/counter-char

Thanks man, you too have a nice weekend wherever you are. Thanks again for your help. One fringe type of query wrongly used double quotes for one field... that is now everywhere so need to account for that

I am sure you could improve it ;) but mortals need to live with what "just works"

@javieranton-zz you made me laugh, but I swear I'm mortal too ... anyway, this is the simplification I had in mind, and one of the reasons I love JS too ;-)

const asTemplateArguments = query => {
  const template = [];
  const args = [template];
  const {length} = query;
  const re = {"'": /''/g, '"': /""/g};
  let open = false;
  let quote = '';
  let i = 0;
  let k = 0;
  while (k < length) {
    switch (query[k++]) {
      case "'":
      case '"':
        if (!open) {
          quote = query[k - 1];
          if (1 < k && /x/i.test(query[k - 2])) {
            k = query.indexOf(quote, k) + 1;
            break;
          }
          open = true;
          template.push(query.slice(i, k - 1));
          i = k;
        }
        else if (k < length && query[k] === quote)
          k++;
        else if (quote === query[k - 1]) {
          open = false;
          args.push(query.slice(i, k - 1).replace(re[quote], quote));
          i = k;
        }
    }
  }
  template.push(query.slice(i));
  return args;
};

Tested via:

console.log(asTemplateArguments(`INSERT OR IGNORE INTO HiddenLinks (Id,SourceId,TargetId) VALUES (X'8E66A9A4225852717B8B7955D23FFC8E',X'F498AE602584A3FABE8C1282EB9AF8A5',X'8EFF4DB423AAA432130EAC6DFE2A170F')`));
console.log(asTemplateArguments(`INSERT INTO Sometable (field1,field2) VALUES ('Value1','Value2')`));
console.log(asTemplateArguments(`INSERT INTO Sometable (field1,field2) VALUES ('Value1',"Value2")`));
console.log(asTemplateArguments(`INSERT INTO Sometable (field1,field2) VALUES ('''Val"ue1''',"""Val'ue2""")`));
console.log(asTemplateArguments(`INSERT OR IGNORE INTO SomeTable (Payload) VALUES ('{"Order":"150"}')`));

I'm glad... seems I haven't completely lost my sense of humor then :-D

Really cool how you can simplify/optimize code in js like this. The new code works but fails for the case of double quotes contained within single quotes. For example, the following query fails;

INSERT OR IGNORE INTO SomeTable (Payload) VALUES ('{"Order":"150"}')

Thanks

fair enough, but the edited version should work: #9 (comment)

4NC
Awesome Andrea, I am in awe! :-D

@javieranton-zz FYI I've fine-tuned the logic around asTemplateArguments which is now faster and smaller than ever: #9 (comment)

Nice one! Tested and it works great