sfu-db/connector-x

special characters on password gives db error

scho-git opened this issue · 4 comments

What language are you using?

Python

What version are you using?

0.3.1

What database are you using?

PostgreSQL

What dataframe are you using?

Arrow / Polars

Can you describe your bug?

I have both "#" and "(" characters in my credentials. From the other bug reports (#319), it looks like urllib.parse.quote_plus helps with the "#". But the "(" is now throwing "RuntimeError: db error: ERROR: syntax error at or near '('". The parenthesis is nowhere else except in my credentials.

It works fine with sqlalchemy. Currently using polars but doesn't seem to be on the polars side since the error gets thrown at the cx.read_sql() portion.

What are the steps to reproduce the behavior?

See example query.

Database setup if the error only happens on specific data or data type
Example query / code
import polars as pl
import urllib
import cred

pw = urllib.parse.quote_plus(cred.password)
uri = f'postgresql://{cred.username}:{pw}@{cred.host}:{cred.port}/{cred.dbname}'

query = "SELECT * from db LIMIT 5"
df = pd.read_database_uri(query=query, uri=uri)

What is the error?

RuntimeError                              Traceback (most recent call last)

Cell In[35], line 24
 19 query = """
 20     SELECT *
 21     FROM db
 22     LIMIT 5
 23     """
---> 24 df = pl.read_database_uri(query=query, uri = uri)


File /opt/jupyterhub/lib64/python3.8/site-packages/polars/io/database.py:707, in read_database_uri(query, uri, partition_on, partition_range, partition_num, protocol, engine, schema_overrides)

704     engine = "connectorx"
706 if engine == "connectorx":
--> 707     return _read_sql_connectorx(
708         query,
709         connection_uri=uri,
710         partition_on=partition_on,
711         partition_range=partition_range,
712         partition_num=partition_num,
713         protocol=protocol,
714         schema_overrides=schema_overrides,
715     )
716 elif engine == "adbc":
717     if not isinstance(query, str):


File /opt/jupyterhub/lib64/python3.8/site-packages/polars/io/database.py:755, in _read_sql_connectorx(query, connection_uri, partition_on, partition_range, partition_num, protocol, schema_overrides)

752 except BaseException as err:
753     # basic sanitisation of /user:pass/ credentials exposed in connectorx errs
754     errmsg = re.sub("://[^:]+:[^:]+@", "://***:***@", str(err))
--> 755     raise type(err)(errmsg) from err
757 return from_arrow(tbl, schema_overrides=schema_overrides)


RuntimeError: db error: ERROR: syntax error at or near "("

I am having this exact same issue -- have tried several versions of connectorx and polars, but cannot successfully connect to a redshift database. I get an identical error, though I don't have a "(" in my password -- I have a "[".

If I try using cx.read_sql instead of pl.read_database_uri, I get the below error message regardless of which type of dataframe I set as the return_type (and I do have pandas installed).

RuntimeError Traceback (most recent call last)
in
----> 1 cx.read_sql(conn=uri,
2 query=f"SELECT * FROM {table} LIMIT 1",
3 return_type="pandas")

C:\ProgramData\Anaconda3\lib\site-packages\connectorx_init_.py in read_sql(conn, query, return_type, protocol, partition_on, partition_range, partition_num, index_col)
262 raise ValueError("You need to install pandas first")
263
--> 264 result = _read_sql(
265 conn,
266 "pandas",

RuntimeError: db error: ERROR: syntax error at or near "("

The raised exception lives in Polars, https://github.com/pola-rs/polars/blob/835d1981dd13b84f466b4bb3ae75e0ea347f8c4e/py-polars/polars/io/database/_utils.py#L78 but I cannot find where it comes from (the connector-x section)

docker run -e POSTGRES_PASSWORD='·$)(%·!"$%("·$(%="!![[][·$%#("·/$%!"($&' -d -p 5432:5432 --rm --name postgres postgres

import connectorx
import urllib.parse

password = '·$)(%·!"$%("·$(%="!![[][·$%#("·/$%!"($&'
cn = connectorx.read_sql(f'postgres://postgres:{urllib.parse.quote_plus(password)}@localhost:5432',
                         'SELECT 1;')

Works fine.

my solution:

import connectorx as cx
from urllib.parse import quote_plus

pwd = quote_plus('mypassword##')
SQLALCHEMY_GAUSSDB_PROD_CONNECT_STR = f'postgresql://myusername:{pwd}@xxx:xxx/xxx' # not working
SQLALCHEMY_GAUSSDB_PROD_CONNECT_STR = f'redshift://myusername:{pwd}@xxx:xxx/xxx' # working!
sql = "SELECT 1 FROM dual;"
start = time.time()
try:
    result = cx.read_sql(SQLALCHEMY_GAUSSDB_PROD_CONNECT_STR, sql)
    logger.warning(len(result))
except Exception as e:
    print(f"查询失败: {e}")

But I do not know if the server has limit this connection. My database is GAUSSDB

these workarounds aren't working on redshift....ignore: redshift:// instead of postgres:// worked