LoneRifle/wa-sqlite-s3vfs

SQLITE_CORRUPT After 100,000 Inserts

Closed this issue · 6 comments

When I create a table with one column and insert about 100,000 records, the subsequent open results in SQLITE_CORRUPT (11). If I create more columns, the inserts fail before 40,000 inserts. It seems dependent on the amount of data.

The last file name I see in S3 is '0000001015'.

That's all the information I have at this time. I'm having trouble setting up a debugger with this module at this time.

Thanks

I was able to do some debugging and I see that the S3 API listObjectsV2 returns only 1000 results at a time and will require use of ContinuationToken for pagination.

https://github.com/LoneRifle/wa-sqlite-s3vfs/blob/main/src/s3vfs.ts#L108-L112

@rrmckinley thanks for bringing this up! I haven't had the chance to maintain this given work commitments.

How important is this for the things you are planning to work on?

@LoneRifle I was able to improve the situation in a patch I am working on. S3 lists filename keys always starting with the lowest. I have implemented base 36 filenames with the first filename being zzzzzzzzzz and the last filename being 0000000000. In this way, it is only necessary to list one filename, always the most recent and lowest, to determine the database size.

In this way, I have been able to test storing and querying fifteen million rows across ten columns. Would you be interested in a PR?

@rrmckinley yes please! Thanks for looking into this. Send it over! I would think that the more comprehensive approach would be to properly implement pagination with ListObjectsV2 and checking for IsTruncated and NextContinuationToken, but I think the two approaches can coexist.

Pull request here: #2

TODO: actually fetch all the objects from S3 by changing the following block:

const { Contents: objects } = await this.s3.listObjectsV2({
Bucket: this.bucketName,
Prefix: `${prefix}/`,
})