jvail/spl.js

Too many parameters to `exec` method causes memory corruption error

duvifn opened this issue · 19 comments

Hi again @jvail ,

Possibly I'm doing something wrong again but running the following code gives me this error:

database disk image is malformed

I was able to reproduce this most of the times, though sometimes the problem doesn't happen.
Executing other operations after this error, sometime gives me errors of memory access out of bound.
The code:

const spl = await SPL(undefined, {
    autoGeoJSON: {
      precision: 8,
      options: 0
    }
  });
// In this line I load the entire proj.db (instead of fetching it in parts).
// the 'projDbStr' var is made using your 'stringify' script and imported as a module
const projdbBinary = pako.inflate(Uint8Array.from(atob(projDbStr), c => c.charCodeAt(0))).buffer;
const db = await spl.mount(
        'proj', [{ name: 'proj.db', data: projdbBinary }]).db(undefined);  

// UPDATE: the following line added later
await db.exec("SELECT InitSpatialMetadata(1);");

const srid = 32636;
const tableName = "test";
const script = `
                SELECT InitSpatialMetadata(1);
                CREATE TABLE ${tableName} (id INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT, field INTEGER, src_id TEXT);
                SELECT AddGeometryColumn('${tableName}', 'geometry', ${srid}, 'GEOMETRY', 'XY');
            `;
await db.read(script);
const batchSize = 10000;
const geom = '{"type": "Polygon", "coordinates": [[[35.172522345781324,31.807637007387367],[35.1730225777626,31.807379406789376],[35.17296088695526,31.807292779878278],[35.17246065497398,31.807550380717725],[35.172522345781324,31.807637007387367]]]}';
    
const parameters = [];
for (let i = 0; i < batchSize; ++i) {
    parameters.push({
      "@field": i,
      "@src_id": i.toString(),
      "@geometry": geom
    });
}
const statement = `INSERT OR REPLACE INTO ${tableName} (field, src_id, geometry) VALUES (@field, @src_id, ST_Transform(SetSRID(GeomFromGeoJSON(@geometry), 4326), ${srid}));`;
await db.exec(statement, parameters);

Changing batchSize to 10 works fine (even if repeated 1000 times to overall number of 10,000 rows).
Also, when using read method, with 10,000 insert statements, everything works fine.

Hope this is not another mistake of mine.

Thank you very much!

jvail commented

Hi @duvifn,

thank you for the bug report - I am really grateful you discover those issue. I think this is the same issue as with the .read function. I try to fix it asap.
Just out of interest: Could you describe the use case for importing such a large number of GeoJSON objects? Maybe there are other ways I can propose to do it more efficiently.

Jan

Thanks @jvail for all the hard work!

Just out of interest: Could you describe the use case for importing such a large number of GeoJSON objects? Maybe there are other ways I can propose to do it more efficiently.

I'm using this to load dynamically fetched OpenStreetMap data in a given area and do some basic geographic operations on it.
Would be glad to know better options to load this amount of data.
Thanks again!

jvail commented

Do you have a little example of the payload (array of feature collections or a single collection or array of features...) you fetch?

I'm using overpass api.

You can fetch json of buildings, for example, by using the following query (typed in the search box in the link above):

[out:json][timeout:25];
// gather results
(
  // query part for: “building=*”
  way["building"](48.46455220998526,35.0202915072441,48.46612084741067,35.023121237754815);
);
// print results
out body;
>;
out skel qt;

Open chrome-dev network panel and you can see the the json object that returned.
I'm using this library to convert between OSM data model to geojson.

An example to a (very small) result set expressed as geojson. These are only buildings, but it can be any other osm entity :

{
    "type": "FeatureCollection",
    "features": [
        {
            "type": "Feature",
            "id": "way/61042647",
            "properties": {
                "type": "way",
                "id": 61042647,
                "tags": {
                    "building": "yes"
                },
                "relations": [],
                "meta": {}
            },
            "geometry": {
                "type": "Polygon",
                "coordinates": [
                    [
                        [
                            35.0984989,
                            48.5126686
                        ],
                        [
                            35.0984363,
                            48.5120265
                        ],
                        [
                            35.0994644,
                            48.5119825
                        ],
                        [
                            35.099513,
                            48.5124812
                        ],
                        [
                            35.0997808,
                            48.5124698
                        ],
                        [
                            35.0997651,
                            48.5123084
                        ],
                        [
                            35.1001132,
                            48.5122935
                        ],
                        [
                            35.1001382,
                            48.5125501
                        ],
                        [
                            35.0998636,
                            48.5125619
                        ],
                        [
                            35.0998837,
                            48.5127677
                        ],
                        [
                            35.0991328,
                            48.5127998
                        ],
                        [
                            35.0991174,
                            48.5126422
                        ],
                        [
                            35.0984989,
                            48.5126686
                        ]
                    ]
                ]
            }
        },
        {
            "type": "Feature",
            "id": "way/284715946",
            "properties": {
                "type": "way",
                "id": 284715946,
                "tags": {
                    "building": "yes"
                },
                "relations": [],
                "meta": {}
            },
            "geometry": {
                "type": "Polygon",
                "coordinates": [
                    [
                        [
                            35.0968207,
                            48.5123026
                        ],
                        [
                            35.096796,
                            48.5120733
                        ],
                        [
                            35.0974695,
                            48.5120414
                        ],
                        [
                            35.0974628,
                            48.5119798
                        ],
                        [
                            35.0975431,
                            48.511976
                        ],
                        [
                            35.097542,
                            48.511966
                        ],
                        [
                            35.0977537,
                            48.511956
                        ],
                        [
                            35.0977611,
                            48.5120248
                        ],
                        [
                            35.0978679,
                            48.5120197
                        ],
                        [
                            35.0978929,
                            48.5122518
                        ],
                        [
                            35.0968207,
                            48.5123026
                        ]
                    ]
                ]
            }
        },
        {
            "type": "Feature",
            "id": "way/284715953",
            "properties": {
                "type": "way",
                "id": 284715953,
                "tags": {
                    "building": "yes"
                },
                "relations": [],
                "meta": {}
            },
            "geometry": {
                "type": "Polygon",
                "coordinates": [
                    [
                        [
                            35.0970016,
                            48.5113257
                        ],
                        [
                            35.096988,
                            48.5112162
                        ],
                        [
                            35.097402,
                            48.5111935
                        ],
                        [
                            35.0974157,
                            48.5113031
                        ],
                        [
                            35.0970016,
                            48.5113257
                        ]
                    ]
                ]
            }
        }
    ]
}
jvail commented

Thank you, interesting. I'll look into it over the weekend - maybe a good opportunity to play a bit with observable notebooks.

Are you building up a database of those features (e.g. a table for each feature type like "buildings") or are you just deriving something from the data and then you through them away?

How do you make sure you do not fetch the same data again? Do you memorize the bbox (I guess you can apply some sort if filter for the osm query..) of previously fetched data?

Are you building up a database of those features (e.g. a table for each feature type like "buildings") or are you just deriving something from the data and then you through them away?

How do you make sure you do not fetch the same data again? Do you memorize the bbox (I guess you can apply some sort if filter for the osm query..) of previously fetched data?

The table is built from scratch per user request.
I'm doing topological checks on data before importing to OSM. So, there is one table for the data to be imported and another table for data already in. The exact types of data depend on the checks the user has to do.
For example, for buildings there are some checks like: 'ensure that to-be imported buildings don't intersect already exist buildings or roads with some buffer' and so on.
This is roughly equivalent to osmose but it's intended to check the data locally before importing.

jvail commented

Thanks for sharing - I have no experience with this "overpass" api but I'll play a bit with it.

Meanwhile two findings & questions:

Are you sure you are using the latest version of spl.js? Because when I fixed the .read function for working with large scripts I added as well a test for a 10000 rows insert with parameters: https://github.com/jvail/spl.js/blob/main/test/browser.js#L269
I tested it even with 100000 and it still gives no error. Could you please check?

I see that you transform to UTM. With the latest version you do not have to import proj,db anymore because I have embedded a minimal proj.db. See updated README.

Thanks. I didn't notice the proj.db related update. This is great.
I use UTM in order to be able to make buffer measured in a projected unit.
I now upgraded to the latest version.
The code above continue to fail, but if I remove the proj.db manual loading it works fine. So i think there might be still a problem somewhere, but it expressed in the wrong place and triggered only with some operations (the proj.db that I loaded manually seems to work fine in other operations).
I also checked with only 1,000 rows at once and the code fails, too (after the upgrade and after I reinserted the manual proj.db loading).

jvail commented

.., but if I remove the proj.db manual loading it works fine. So i think there might be still a problem somewhere, but it expressed in the wrong place and triggered only with some operations (the proj.db that I loaded manually seems to work fine in other operations).

Do you have issues with transformations from/to UTM? I have (quite brutally) reduced the size of proj.db. Maybe I missed something. Can you track down the proj.db error (maybe PROJ_GetLastErrorMsg?) or the query where it fails?
In case you want to use your own proj.db you have to set the path per db connection using spatialite's PROJ_SetDatabasePath function. So even if you mount your own proj.db it was not used with the latest spl.

I also checked with only 1,000 rows at once and the code fails, too (after the upgrade and after I reinserted the manual proj.db loading).

Hm, difficult if I can not reproduce it. What is the OS/Browser you use? Maybe you run out of memory for wasm? .. wait, now I can :)

Do you have issues with transformations from/to UTM? I have (quite brutally) reduced the size of proj.db. Maybe I missed something. Can you track down the proj.db error (maybe PROJ_GetLastErrorMsg?) or the query where it fails?
In case you want to use your own proj.db you have to set the path per db connection using spatialite's PROJ_SetDatabasePath function. So even if you mount your own proj.db it was not used with the latest spl.

Something weird...
I now noticed that I have another call to InitSpatialMetadata after creating the db. so I updated the code above... Without it everything works fine to me, even if I add the proj.db manually.
PROJ_GetLastErrorMsg is null.

The exact number of batchSize where it fails is 30 (29 is fine). It doesn't matter if I start the loop from other number. For example the following code produce the same error:

const batchSize = 30;
for (let i = batchSize; i < batchSize * 2; ++i) {
  ...
}

I was able to reproduce this error on Firefox, Chrome and Opera (on Ubuntu 20).

jvail commented

After some debugging it turns out that the problem is the combination of AUTOINCREMENT and AddGeometryColumn. Maybe an issue with spatialite...
If you remove the AUTOINCREMENT from the table definition or remove the AddGeometryColumn and insert the JSON as TEXT then it works.

It seems you don't need the AUTOINCREMENT. The id will still be incremented automatically without it.

jvail commented

Thought about ways to work with the OSM data more efficiently...

First I thought it could be interesting to move all the fetching, processing into an extension (https://github.com/jvail/spl.js#extensions-api-browser-only) but Firefox still does not support dynamic imports.

Did you take a look at the ImportGeoJSON and VirtualGeoJSON functions? https://www.gaia-gis.it/fossil/libspatialite/wiki?name=Supporting+GeoJSON

You could mount a large GeoJSON file, then make it available as a virtual table and drop the virtual table and unmount the data after processing. The disadvantage is that you can not append data to the virtual table but on the other hand you do not need to take care of geom column / table creation.

Thanks for all the efforts and the suggestions, @jvail!
This looks promising. I will take a look at this.

jvail commented

I did a few more tests: It is a problem with sqlite versions >= 3.36 and spatialite 5.0.1. If I build with sqlite 3.34 then using AUTOINCREMENT does not give database error.

jvail commented

Hi @duvifn,

I have updated the build in the repo. Could you give it a try if your app now works without error using this build (ideally relying on the embedded proj.db)? If it does I'll file a new release.

Thanks, Jan

@jvail Thanks.
Do you know where was the problem exactly? If not, I'm a bit afraid that we miss the real problem here. Even with current sqlite version it's hard to reproduce it. While I'm pretty sure the problem is not of spl library (might be sqlite, spatialite, emscripten or even webassembly standard that causes it) I think we should at least know where the problem is.

That said, upgrading sqlite is a good idea anyway.

jvail commented

I am pretty sure this was the problem: https://groups.google.com/g/spatialite-users/c/SnNZt4AGm_o/m/GypX8ypABAAJ
I have now updated sqlite and set the -DSQLITE_ALLOW_ROWID_IN_VIEW as proposed. Seems to work - at least my test does not fail.

I debugged this now. The problem is caused by spl.js. It keeps freeing already free heap memory. Opening a PR.

jvail commented

Thank you @duvifn! I'll file a new release. The previous one was a bit premature :)