Columns with spaces cause double quotes when filtering, getting errors
paf91 opened this issue · 1 comments
Describe the bug
Hi. I'm facing issue using PySpark+JDBC driver, when columns have spaces, like "Test Column". E.g. when trying to filter table using pure jdbc driver or spark-clickhouse-connector+same jdbc driver:
tt = spark.table("test.table1")
tt.where(col("Test Column") == '123').count()
Translates into:
SELECT COUNT(*) FROM test.table1 WHERE (1=1) AND ((1=1) AND (``some ID`` = '123') AND (1=1))
Which causes error in clickhouse because it wraps column names with `` even if we already did it
Meanwhile tt.show() works fine
Steps to reproduce
- Initialize spark session
spark = SparkSession.builder.getOrCreate()
- Connect to clickhouse
host = 'somehost'
port = '8443'
database = 'test'
url = f'jdbc:clickhouse://{host}:{port}/{database}?&ssl=true'
user = 'test.user'
dbtable = 'test.test_table'
driver = 'com.clickhouse.jdbc.ClickHouseDriver'
test_table = (
spark.read.format('jdbc')
.option('driver', driver)
.option('url', url)
.option('user', user)
.option('password', password)
.option('dbtable', dbtable)
.load()
)
- Execute query
test_table.where(col("Column Test").isNotNull()).show()
Expected behaviour
No error, dataframe return results
Code example
tt = spark.table("test.table1")
tt.select(col("Test Column")).where(col("Test Column") == '123').count()
Error log
<Error> DynamicQueryHandler: Code: 62. DB::Exception: Syntax error: failed at position 83 ('``') (line 3, col 29): ``Test Column`` = '123') AND (1=1))
LIMIT 21
. Expected one of: literal, NULL, number, Bool, true, false, string literal, SELECT query, possibly with UNION, list of union elements, SELECT query, subquery, possibly with UNION, SELECT subquery, SELECT query, WITH, FROM, SELECT, EXPLAIN, token, Comma, ClosingRoundBracket, CAST operator, NOT, INTERVAL, CASE, DATE, TIMESTAMP, tuple, collection of literals, array, asterisk, qualified asterisk, compound identifier, list of elements, identifier, COLUMNS matcher, COLUMNS, qualified COLUMNS matcher, substitution, MySQL-style global variable. (SYNTAX_ERROR), Stack trace (when copying this message, always include the lines below):
- DB::Exception::Exception(DB::Exception::MessageMasked&&, int, bool) @ 0x000000000c6d5d7b in /usr/bin/clickhouse
- DB::Exception::createDeprecated(String const&, int, bool) @ 0x000000000c72de4d in /usr/bin/clickhouse
- DB::parseQueryAndMovePosition(DB::IParser&, char const*&, char const*, String const&, bool, unsigned long, unsigned long) @ 0x0000000012ebc03c in /usr/bin/clickhouse
- DB::executeQueryImpl(char const*, char const*, std::shared_ptrDB::Context, DB::QueryFlags, DB::QueryProcessingStage::Enum, DB::ReadBuffer*) @ 0x00000000117221c5 in /usr/bin/clickhouse
- DB::executeQuery(DB::ReadBuffer&, DB::WriteBuffer&, bool, std::shared_ptrDB::Context, std::function<void (DB::QueryResultDetails const&)>, DB::QueryFlags, std::optionalDB::FormatSettings const&, std::function<void (DB::IOutputFormat&)>) @ 0x0000000011729bea in /usr/bin/clickhouse
- DB::HTTPHandler::processQuery(DB::HTTPServerRequest&, DB::HTMLForm&, DB::HTTPServerResponse&, DB::HTTPHandler::Output&, std::optionalDB::CurrentThread::QueryScope&) @ 0x0000000012616f8d in /usr/bin/clickhouse
- DB::HTTPHandler::handleRequest(DB::HTTPServerRequest&, DB::HTTPServerResponse&) @ 0x000000001261bdb6 in /usr/bin/clickhouse
- DB::HTTPServerConnection::run() @ 0x0000000012696c12 in /usr/bin/clickhouse
- Poco::Net::TCPServerConnection::start() @ 0x00000000150f4e52 in /usr/bin/clickhouse
- Poco::Net::TCPServerDispatcher::run() @ 0x00000000150f5c51 in /usr/bin/clickhouse
- Poco::PooledThread::run() @ 0x00000000151ece67 in /usr/bin/clickhouse
- Poco::ThreadImpl::runnableEntry(void*) @ 0x00000000151eb45c in /usr/bin/clickhouse
- ? @ 0x00007f2c94dbcac3 in ?
- ? @ 0x00007f2c94e4e850 in ?
(version 23.12.2.59 (official build))
Configuration
PySpark 3.3.4
Clikchouse JDBC Driver 0.6
Environment
- Client version: Clikchouse JDBC Driver 0.6
- Language version: Python 3.10
- OS: Ubuntu
ClickHouse server
- ClickHouse Server version: 23.12.2.59