Problem with square bracket in CSV column name
foscoj opened this issue · 7 comments
testing some data from european power information (entsoe.eu), the title of the csv contains square brackets.
as I am playing with glitch, sqlite-utils are used for creating the db.
Traceback (most recent call last):
File "/app/.local/bin/sqlite-utils", line 8, in
sys.exit(cli())
File "/app/.local/lib/python3.7/site-packages/click/core.py", line 764, in call
return self.main(*args, **kwargs)
File "/app/.local/lib/python3.7/site-packages/click/core.py", line 717, in main
rv = self.invoke(ctx)
File "/app/.local/lib/python3.7/site-packages/click/core.py", line 1137, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/app/.local/lib/python3.7/site-packages/click/core.py", line 956, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/app/.local/lib/python3.7/site-packages/click/core.py", line 555, in invoke
return callback(*args, **kwargs)
File "/app/.local/lib/python3.7/site-packages/sqlite_utils/cli.py", line 434, in insert
default=default,
File "/app/.local/lib/python3.7/site-packages/sqlite_utils/cli.py", line 384, in insert_upsert_implementation
docs, pk=pk, batch_size=batch_size, alter=alter, **extra_kwargs
File "/app/.local/lib/python3.7/site-packages/sqlite_utils/db.py", line 997, in insert_all
extracts=extracts,
File "/app/.local/lib/python3.7/site-packages/sqlite_utils/db.py", line 618, in create
extracts=extracts,
File "/app/.local/lib/python3.7/site-packages/sqlite_utils/db.py", line 310, in create_table
self.conn.execute(sql)
sqlite3.OperationalError: unrecognized token: "]"
entsoe_2016.csv
renamed to txt for uploading compatibility
code is remixed directly from your https://glitch.com/edit/#!/datasette-csvs repo
Thanks for the example file - looks like it can be trimmed down to just these two lines to replicate the bug:
"MTU (CET)","Day-ahead Price [EUR/MWh]"
"01.01.2016 00:00 - 01.01.2016 01:00","23.86"
https://stackoverflow.com/a/22694438 looks like the answer:
When using square brackets, it is not possible to have these characters in the identifier.
When using double quotes, you can escape them in the name by doubling them:
CREATE TABLE "hello ""world"""(key INTEGER PRIMARY KEY);
There's something weird about this. I created a test database file like so:
sqlite3 /tmp/demo.db <<EOF
BEGIN TRANSACTION;
CREATE TABLE "data" (
"MTU (CET)" TEXT,
"Day-ahead Price [EUR/MWh]" TEXT
);
INSERT INTO "data" VALUES('01.01.2016 00:00 - 01.01.2016 01:00','23.86');
COMMIT;
EOF
Then confirmed that it works as expected in SQLite:
$ sqlite3 /tmp/demo.db
SQLite version 3.24.0 2018-06-04 14:10:15
Enter ".help" for usage hints.
sqlite> .schema
CREATE TABLE IF NOT EXISTS "data" (
"MTU (CET)" TEXT,
"Day-ahead Price [EUR/MWh]" TEXT
);
sqlite> .headers on
sqlite> select * from data;
MTU (CET)|Day-ahead Price [EUR/MWh]
01.01.2016 00:00 - 01.01.2016 01:00|23.86
sqlite>
BUT... if I open the same database in Python, something weird happens:
In [1]: import sqlite3
In [2]: conn = sqlite3.connect("/tmp/demo.db")
In [3]: cursor = conn.cursor()
In [4]: cursor.execute("select * from data")
Out[4]: <sqlite3.Cursor at 0x10c70a0a0>
In [5]: cursor.fetchall()
Out[5]: [('01.01.2016 00:00 - 01.01.2016 01:00', '23.86')]
In [6]: cursor.description
Out[6]:
(('MTU (CET)', None, None, None, None, None, None),
('Day-ahead Price', None, None, None, None, None, None))
In [7]: conn.row_factory = sqlite3.Row
In [8]: cursor = conn.cursor()
In [9]: cursor.execute("select * from data")
Out[9]: <sqlite3.Cursor at 0x10c7a8490>
In [10]: row = cursor.fetchall()
In [12]: row
Out[12]: <sqlite3.Row at 0x10c3fe670>
In [15]: row.keys()
Out[15]: ['MTU (CET)', 'Day-ahead Price']
Note that in cursor.description AND in row.keys() above the second column is displayed as 'Day-ahead Price' - when we would expect it to be displayed as Day-ahead Price [EUR/MWh]
So.... it looks like there may be a bug in Python's sqlite3 module where columns with square braces in them have that portion of the name stripped out!
I'm not sure what to do about this one.
I can't fix it: this bug in Python's sqlite3 module means that even if I write a database out with column names that include [] I won't be able to read them back again.
So... I could do one of the following:
- Throw an error if a column name includes those characters. That's my preferred option I think.
- Automatically replace
[in column names with(and]with) - Do the automatic replacement but show a user-visible warning when I do it
- Throw an error, but give the user an option to run with e.g.
--fix-column-nameswhich applies that automatic fix.
Since this is likely to be an incredibly rare edge-case I think I'd rather minimize the amount of code that deals with it, so my preferred option is to just throw that error and stop.
Probably the best option to just throw the error.
Is there any active dev chan where we could post the issue to python sqlite3?
I filed a bug in the Python issue tracker here: https://bugs.python.org/issue39652
I pushed a branch with my experiment in it, but I'm going to fix this by throwing an error on [ or ] in a column name instead - I won't implement the changes from that branch.