Is the primary key constraint correctly implemented?
EtoDemerzel0427 opened this issue · 5 comments
UPDATE:
The errors mentioned here and below are likely due to limited memory, as the queries worked when using only 1/10 of the data.
However, another grammar issue has caused some confusion: it appears that Mutable does not support the AS
keyword, even though the shell provides syntax highlighting for it and does not raise any errors when it is used in a query.
Also, I am seeing the process being killed running this query:
badge
is a table with 79851 rows, while users
has 40326 rows. Do you have any idea why this would happen? I am running these on my M2 Pro Macbook with 16GB memory.
I am using the STATS dataset from this repo.
OK, so there are a bunch of things you mentioned in this issue. Let me address them one after the other.
We do not implement primary key constraints, but we accept the syntax. Kinda weird state right now, I know.
The AS
keyword should be supported. Ateast, it used to be. You can use flag --ast
to see the AST and check for parsing errors (or --astdot
if you fancy a rendered AST graph) .
The table sizes are well within the sizes we support. We are, however, limited to the 4GiB memory for Webassembly modules in V8. This means, the combined intermediate results and final result as well as your tables must fit into these 4 GiB. Very unpleasent, I admit. This is something we want to address in various ways, ultimately aiming for supporting arbitrarily large tables and intermediate results (well, up to 128 TiB, but that should suffice most scenarios). I created an issue for this already (see GitLab).
Anyhow, if your data set or query goes beyond this 4 GiB memory bound, then we have OOB accesses in V8 that will lead to a crash.
Sadly, I am very short on time rn and hence there is no progress on the memory issue. I am working at a cloud DB vendor since December and since then my contribution frequency to mutable has severely declined :(
Hope I can find some time again to work on some major boulders.
Thank you for your reply! I apologize for writing so much content in a single issue - I posted them as soon as I encountered a problem. Thank you for not missing any of the points.
Not implementing primary key constraints would be a bit strange for me, because people usually consider it a basic feature of a relational database, one that should be implemented first before others. Are there any specific design or technical reasons why mutable hasn't implemented it?
Regarding AS
, I haven't looked at the AST, but you should be able to see from my screenshots that after using AS, the results of all queries become 0. This happens even when there are no joins. TBH, since the benchmark can run, I haven't pulled new commits and recompiled recently, but if there's no content about it in your updates in the past month, I think it's very likely that mutable on the master branch will also behave this way.
Regarding V8's memory limitations, I'd say I'm a bit surprised: for a DBMS, 4GB is indeed a bit too small. Actually, I was curious from the beginning as to why V8 was used. If it's for JIT, LLVM can also do it, but in contrast, the limitations brought by V8 are really too many. This limitation almost makes mutable a toy, even when we consider it a research-oriented DBMS. Do you already have an idea about how you are going to resolve this limitation?
Not implementing primary key constraints would be a bit strange for me, because people usually consider it a basic feature of a relational database, one that should be implemented first before others. Are there any specific design or technical reasons why mutable hasn't implemented it?
This requires index support, which is an ongoing effort. @marcelmaltry is currently working on indexing. When his changes are merged, we will be very close to supporting primary keys, i think.
The AS
keyword and its use for aliasing expressions and tables, should be fully supported.
V8 memory is going to increase. We have an issue on our GitLab to upgrade to 64bit addressing and 16 GiB memories. Also, we want to change the way host and embedded code share data, allowing for arbitrarily large allocations. This is WIP and not a small task.
You can read about that here: https://gitlab.cs.uni-saarland.de/bigdata/mutable/mutable/-/issues/168