[Bug]: SQLite: Error: Unknown Error
Closed this issue · 12 comments
Describe the bug
I'm trying to load an SQLite database that's around 100MB.
Seems like I'm hitting this line when trying to access a table in the db that's bigger than 32MB:
https://github.com/evidence-dev/evidence/blob/main/packages/lib/sdk/src/plugins/datasources/wrapSimpleConnector.js#L52
By reading the code, I can't see a workaround for this.
Obs: All .sql
files consume the same database file
Steps to Reproduce
- Try to load a SQLite base that's bigger than 32MB
- Try to access a table that's bigger than 32MB
- run
npm run sources
Logs
> evidence sources
✔ Loading plugins & sources
-----
[Processing] articles_connection
article_categories ✔ Finished, wrote 108594 rows.
article_classification ✔ Finished, wrote 64421 rows.
articles-database ⚠ No results returned.
articles ✖ Error: Unknown Error
categories ✔ Finished, wrote 155 rows.
-----
Evaluated sources, saving manifest
✅ Done!
With debug, I have the following line:
$ NODE_OPTIONS="--max-old-space-size=4096" npm run sources -- --debug;
> my-evidence-project@0.0.1 sources
> evidence sources --debug
Evidence running with debug logging
✔ Loading plugins & sources
-----
[Processing] articles_connection
article_categories ◢ Processing...[DEBUG]: Building parquet file article_categories.parquet
[DEBUG]: Reading rows from a generator object
article_categories ◥ Processing...[DEBUG]: Measure: "buildMultipartParquet" {
duration: 2868.07013,
meta: { 'batch number': 0 },
parents: [ 'buildMultipartParquet' ]
}
[DEBUG]: Flushing batch 0 with 108594 rows
[DEBUG]: Flushing batch 0 with 108594 rows
article_categories ◢ Processing...[DEBUG]: Measure: "flush" {
duration: 418.50450400000045,
meta: { 'batch number': 0 },
parents: [ 'buildMultipartParquet' ]
}
[DEBUG]: Flushed batch 0 with 108594 rows
article_categories ◣ Processing...[DEBUG]: Measure: "buildMultipartParquet" {
duration: 3679.8906589999997,
meta: { 'output filename': 'article_categories.parquet' },
parents: []
}
article_categories ✔ Finished, wrote 108594 rows.
article_classification ◢ Processing...[DEBUG]: Building parquet file article_classification.parquet
[DEBUG]: Reading rows from a generator object
article_classification ◢ Processing...[DEBUG]: Measure: "buildMultipartParquet" {
duration: 2006.8502520000002,
meta: { 'batch number': 0 },
parents: [ 'buildMultipartParquet' ]
}
[DEBUG]: Flushing batch 0 with 64421 rows
[DEBUG]: Flushing batch 0 with 64421 rows
article_classification ◣ Processing...[DEBUG]: Measure: "flush" {
duration: 806.4016789999996,
meta: { 'batch number': 0 },
parents: [ 'buildMultipartParquet' ]
}
[DEBUG]: Flushed batch 0 with 64421 rows
article_classification ◤ Processing...[DEBUG]: Measure: "buildMultipartParquet" {
duration: 3263.2427499999994,
meta: { 'output filename': 'article_classification.parquet' },
parents: []
}
article_classification ✔ Finished, wrote 64421 rows.
Will not eagerly load files larger than 32 Megabytes.
articles-database ⚠ No results returned.
articles ✖ Error: Unknown Error
categories ◢ Processing...[DEBUG]: Building parquet file categories.parquet
[DEBUG]: Reading rows from a generator object
[DEBUG]: Measure: "buildMultipartParquet" {
duration: 4.869833999999173,
meta: { 'batch number': 0 },
parents: [ 'buildMultipartParquet' ]
}
[DEBUG]: Flushing batch 0 with 155 rows
[DEBUG]: Flushing batch 0 with 155 rows
[DEBUG]: Measure: "flush" {
duration: 3.3371879999995144,
meta: { 'batch number': 0 },
parents: [ 'buildMultipartParquet' ]
}
[DEBUG]: Flushed batch 0 with 155 rows
[DEBUG]: Measure: "buildMultipartParquet" {
duration: 14.027275999998892,
meta: { 'output filename': 'categories.parquet' },
parents: []
}
categories ✔ Finished, wrote 155 rows.
-----
Evaluated sources, saving manifest
Updating schema 'articles_connection'
| Schema exists already
| 4 queries found
| article_categories
| article_classification
| articles-database
| categories
| 0 queries are new
| 3 queries already exists
| static/data/articles_connection/article_categories/article_categories.parquet
| static/data/articles_connection/article_classification/article_classification.parquet
| static/data/articles_connection/categories/categories.parquet
| 3 queries to be rendered
| static/data/articles_connection/article_categories/article_categories.parquet
| static/data/articles_connection/article_classification/article_classification.parquet
| static/data/articles_connection/categories/categories.parquet
✅ Done!
System Info
System:
OS: macOS 15.1.1
CPU: (12) x64 Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz
Memory: 217.51 MB / 16.00 GB
Shell: 3.2.57 - /bin/bash
Binaries:
Node: 22.11.0 - ~/.nvm/versions/node/v22.11.0/bin/node
npm: 10.5.0 - ~/.nvm/versions/node/v22.11.0/bin/npm
pnpm: 8.7.5 - ~/Library/pnpm/pnpm
Watchman: 2023.07.03.00 - /usr/local/bin/watchman
npmPackages:
@evidence-dev/bigquery: ^2.0.8 => 2.0.8
@evidence-dev/core-components: ^4.8.13 => 4.8.13
@evidence-dev/csv: ^1.0.13 => 1.0.13
@evidence-dev/databricks: ^1.0.7 => 1.0.7
@evidence-dev/duckdb: ^1.0.12 => 1.0.12
@evidence-dev/evidence: ^39.1.17 => 39.1.17
@evidence-dev/motherduck: ^1.0.3 => 1.0.3
@evidence-dev/mssql: ^1.1.1 => 1.1.1
@evidence-dev/mysql: ^1.1.3 => 1.1.3
@evidence-dev/postgres: ^1.0.6 => 1.0.6
@evidence-dev/snowflake: ^1.2.1 => 1.2.1
@evidence-dev/sqlite: ^2.0.6 => 2.0.6
@evidence-dev/trino: ^1.0.8 => 1.0.8
Severity
blocking all usage of Evidence
Additional Information, or Workarounds
This is my connection.yaml file:
name: articles_connection
type: sqlite
options:
filename: articles-database.db
readonly: true
This is the articles.sql
select * from articles limit 1;
Can you also confirm what columns and column types are in your sqlite file in the articles
and articles-database
table?
@archiewood This is the query that. creates the articles table:
CREATE TABLE IF NOT EXISTS articles (
id TEXT PRIMARY KEY,
title TEXT NOT NULL,
published TEXT NOT NULL,
abstract TEXT NOT NULL,
conclusion TEXT,
link TEXT UNIQUE NOT NULL,
input_token INTEGER,
output_token INTEGER,
created_at TEXT DEFAULT CURRENT_TIMESTAMP,
updated_at TEXT DEFAULT CURRENT_TIMESTAMP
);
There is no articles-database table. But this is the structure of the folder:
$ cd sources/articles_connection/
$ ls -l
total 122832
-rw-r--r-- 1 user staff 33 Nov 21 19:06 article_categories.sql
-rw-r--r-- 1 user staff 37 Nov 21 19:06 article_classification.sql
-rw-r--r--@ 4 user staff 61861888 Nov 21 13:40 articles-database.db
-rw-r--r-- 1 user staff 23 Nov 21 19:53 articles.sql
-rw-r--r-- 1 user staff 25 Nov 21 19:05 categories.sql
-rw-r--r-- 1 user staff 98 Nov 21 19:50 connection.yaml
I think this is the expected behaviour regarding the loading of the files. We dont need to read the contents of the SQLite file's text, we just need to query it.
However the error is happening when running the query articles.sql, and the error message that is given back is not helpful!
@archiewood I see. Indeed, the 32MB log probably comes from trying to load the .db
file, not the query! It makes total sense!
But indeed my biggest problem is the articles.sql file. Please let me know what I can do to help provide more relevant data! I'm very interested in solving this problem
I assume this same query runs successfully against sqlite in some other client?
@archiewood Yes it does
SQLite version 3.42.0 2023-05-16 12:36:15
Enter ".help" for usage hints.
sqlite> select * from articles limit 1;
1653afcd-92ea-4468-8891-99c9fcc7275e|Randomized Autoregressive Visual Generation|2024-11-01|This paper presents Randomized AutoRegressive modeling (RAR) for visual generation, which sets a new state-of-the-art performance on the image generation task while maintaining full compatibility with language modeling frameworks. The proposed RAR is simple: during a standard autoregressive training process with a next-token prediction objective, the input sequence-typically ordered in raster form-is randomly permuted into different factorization orders with a probability r, where r starts at 1 and linearly decays to 0 over the course of training. This annealing training strategy enables the model to learn to maximize the expected likelihood over all factorization orders and thus effectively improve the model's capability of modeling bidirectional contexts. Importantly, RAR preserves the integrity of the autoregressive modeling framework, ensuring full compatibility with language modeling while significantly improving performance in image generation. On the ImageNet-256 benchmark, RAR achieves an FID score of 1.48, not only surpassing prior state-of-the-art autoregressive image generators but also outperforming leading diffusion-based and masked transformer-based methods. Code and models will be made available at https://github.com/bytedance/1d-tokenizer||https://arxiv.org/abs/2411.00776|2024-11-04 17:28:49|2024-11-07 09:26:35||
@archiewood Seems like i managed to debug the solution:
#2849
Please modify as you see fit!
If possible, can you let me know when this will be released? This is blocking a page that I'm trying to build, and I would love to release my code as soon as possible.
Thank you!
Hi @luanmuniz - thanks for the PR.
We'll look at this next week. Next release is scheduled for Thursday 28th.
If you want to get unblocked faster, you could release your version as a community plugin.
https://docs.evidence.dev/plugins/create-source-plugin/
Since you have written all the code already i imagine it will just be a bit of copy pasting.
you can then install your plugin in evidence and drive on!
Should be decent instructions here in the template:
https://github.com/evidence-dev/datasource-template
if you have any questions or issues with the template let me know!
More context on this from one user:
In SQLite you are able to define a column without a datatype ( argh ). Those columns are causing the error in Evidence.
@archiewood This is not the problem i was having just to make it clear. You can see in the table schema i sent a few messages back that all columns have types.
Am i missing something?
Ah, good point I misremembered. I thought these might have the same cause