MDB_MAP_FULL after a moderate number of assertions
Closed this issue · 8 comments
from a clean database, I assert simple facts that look like this:
[ { "/": cid }, "count", count } ]
Depending on how many I assert in a batch (and if I add other asserts of large content) I quickly (a few hundred to thousand) get the error:
{"message":"Task was aborted\nError: MDB_MAP_FULL"}
You can use this script to reproduce:
import { CID } from "npm:multiformats@13.3.0/cid";
import * as json from 'npm:multiformats@13.3.0/codecs/json'
import { sha256 } from 'npm:multiformats@13.3.0/hashes/sha2'
const SYNOPSYS_URL = Deno.env.get("SYNOPSYS_URL") || "http://localhost:8080";
export async function cid(data: any) {
const bytes = json.encode(data);
const hash = await sha256.digest(bytes);
const cid = CID.create(1, json.code, hash);
return cid.toString();
}
export async function import_fake(start: number, batch_size: number) {
let facts: Fact[] = [];
for (let i = start; i < start + batch_size; i++) {
facts.push([{ "/": await cid({ count: i }) }, "count", i]);
}
// console.log(JSON.stringify(facts));
return await fetch(`${SYNOPSYS_URL}/assert`, {
method: 'PATCH', body: JSON.stringify(facts),
}).then(r => r.json())
}
if (import.meta.main) {
const batch_size = 1;
let count = 1;
while (true) {
console.log(`Importing ${count}...`);
let result = await import_fake(count, batch_size);
if (!result.ok) {
console.log(`Error importing ${count}: ${JSON.stringify(result.error)}`);
break;
}
count += batch_size;
}
}
this version shoves addition key/values for each cid - causing the crash earlier
import { CID } from "npm:multiformats@13.3.0/cid";
import * as json from 'npm:multiformats@13.3.0/codecs/json'
import { sha256 } from 'npm:multiformats@13.3.0/hashes/sha2'
const SYNOPSYS_URL = Deno.env.get("SYNOPSYS_URL") || "http://localhost:8080";
export async function cid(data: any) {
const bytes = json.encode(data);
const hash = await sha256.digest(bytes);
const cid = CID.create(1, json.code, hash);
return cid.toString();
}
export async function import_fake(start: number, batch_size: number, base: any) {
let facts: Fact[] = [];
for (let i = start; i < start + batch_size; i++) {
let id = { "/": await cid({ count: i }) };
facts.push([id, "count", i]);
Object.keys(base).forEach(k => {
facts.push([id, k, base[k]]);
});
}
// console.log(JSON.stringify(facts));
return await fetch(`${SYNOPSYS_URL}/assert`, {
method: 'PATCH', body: JSON.stringify(facts),
}).then(r => r.json())
}
function generateRandomString(length: number): string {
const characters = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789';
let result = '';
for (let i = 0; i < length; i++) {
result += characters.charAt(Math.floor(Math.random() * characters.length));
}
return result;
}
if (import.meta.main) {
const batch_size = 100;
let count = 1;
while (true) {
console.log(`Importing ${count}...`);
let base = {
"extra": generateRandomString(1000),
"extra2": generateRandomString(1000),
"extra3": generateRandomString(1000),
"extra4": generateRandomString(1000),
"extra5": generateRandomString(1000),
"extra6": generateRandomString(1000),
"extra7": generateRandomString(1000),
"extra8": generateRandomString(1000),
"extra9": generateRandomString(1000),
"extra10": generateRandomString(1000),
};
let result = await import_fake(count, batch_size, base);
if (!result.ok) {
console.log(`Error importing ${count}: ${JSON.stringify(result.error)}`);
break;
}
count += batch_size;
}
}
Thanks @anotherjesse for the script illustrating the issue, I have turned it into a test case in the PR. Running it locally I get 4131 import rounds and then a crash. Looking at the data.mdb
it appears slightly larger than 10MB. Looks like 2.53kb per import which is not great , but we also did not attempt to optimize any of this so perhaps that is not too surprising.
Did a little test where I wrote json file with all the asserts that went into DB and it came out around 16x times smaller. Some napkin math to asses if current overhead is within the reason (without any optimizations)
- We store each fact 3x times, on per indexing strategy
- Each record contains assertion + path encoding so roughly that is 2 * 3 = 6 times per record.
- For each transaction we store a transaction data with triple index that is 3x per transaction so we're around 9x records
- Our datoms actually have tx link in addition which adds some overhead we can roughly say it would amount 3 links (one per index) which is probably around the size of the record itself so 10x record overhead now
Okra tree representation has overhead of it's own and probably so does LMDB. In this scenario we generate completely unique data so it's pretty pathological case. All in all 16x overhead is perhaps within the expectations until we take time to optimize things.
Am I holding this wrong?
I have a checkout of main on the latest, with no local changes:
jesse@fourteen synopsys % git pull
Already up to date.
jesse@fourteen synopsys % git status
On branch main
Your branch is up to date with 'origin/main'.
nothing to commit, working tree clean
I deleted all my node_modules and re-installed (I would have used npm ci
if a lock file was commited)
jesse@fourteen synopsys % rm -rf node_modules
jesse@fourteen synopsys % npm i
npm WARN deprecated inflight@1.0.6: This module is not supported, and leaks memory. Do not use it. Check out lru-cache if you want a good and tested way to coalesce async requests by a key value, which is much more comprehensive and powerful.
npm WARN deprecated rimraf@3.0.2: Rimraf versions prior to v4 are no longer supported
npm WARN deprecated glob@7.2.3: Glob versions prior to v9 are no longer supported
added 257 packages, and audited 258 packages in 2s
89 packages are looking for funding
run `npm fund` for details
found 0 vulnerabilities
I started up synopsys with an empty store:
jesse@fourteen synopsys % rm -rf service-store
jesse@fourteen synopsys % npm run start
> synopsys@1.4.1 start
> node src/main.js
And then I ran count script above and it fails ~500 items:
Error importing 493: {"message":"Task was aborted\nError: MDB_MAP_FULL"}
(I see a pnpm lock file - I could try using it? although at this point we have npm, deno and now pnpm? do we need all three?)
Trying with pnpm
jesse@fourteen synopsys % rm -rf node_modules
jesse@fourteen synopsys % pnpm ci
ERR_PNPM_CI_NOT_IMPLEMENTED The ci command is not implemented yet
jesse@fourteen synopsys % pnpm i
Lockfile is up to date, resolution step is skipped
Packages: +246
Progress: resolved 246, reused 246, downloaded 0, added 246, done
dependencies:
+ @canvas-js/okra 0.4.5
+ @canvas-js/okra-lmdb 0.2.0
+ @canvas-js/okra-memory 0.4.5
+ @ipld/dag-cbor 9.2.1
+ @ipld/dag-json 10.2.2
+ @noble/hashes 1.3.3
+ @types/node 22.5.5
+ datalogia 0.8.0
+ multiformats 13.3.0
devDependencies:
+ @web-std/fetch 4.2.1
+ @web-std/stream 1.0.3
+ c8 8.0.1
+ entail 2.1.2
+ playwright-test 14.0.0
+ prettier 3.1.0
+ typescript 5.3.3
Done in 2.1s
jesse@fourteen synopsys % pnpm run start
> synopsys@1.4.1 start /Users/jesse/ct/synopsys
> node src/main.js
unfortunately it still errors out after ~500 with the MDB issue
Importing 557...
Importing 558...
Error importing 558: {"message":"Task was aborted\nError: MDB_MAP_FULL"}
@anotherjesse one thing that occured to me, it may be that size is fixed at DB creation time. So if you had a small db at /Users/jesse/ct/synopsys
restarting synopsys with new size may not have an effect.
see #31
This was fixed