Microsoft SQL Server Connection with ConnectorX throwing timed out bb8
ryanvdw1997 opened this issue · 7 comments
What language are you using?
Python
What version are you using?
polars version 0.18.15
connectorx version 0.3.1
What database are you using?
Microsoft SQL Server
What dataframe are you using?
Arrow
Can you describe your bug?
It takes a while to run read_database and then ultimately times out
What are the steps to reproduce the behavior?
Just using an MSSQL server connection uri and running read_database with a basic query
Database setup if the error only happens on specific data or data type
N/A
Example query / code
db_str = f"mssql://{username}:{password}@{server}:1433/{database}?driver=SQL+Server&trusted_connection=no"
pl.read_database(query="select * from foo", connection=db_str)
What is the error?
RuntimeError Traceback (most recent call last)
Input In [9], in <cell line: 7>()
5 driver = '{ODBC Driver 17 for SQL Server}'
6 db_str = f"mssql://{username}:{password}@{server}:1433/{database}?driver=SQL+Server&trusted_connection=no"
----> 7 pl.read_database(query="select * from player", connection=db_str)
File ~\anaconda3\lib\site-packages\polars\utils\deprecation.py:93, in deprecate_renamed_parameter.<locals>.decorate.<locals>.wrapper(*args, **kwargs)
88 @wraps(function)
89 def wrapper(*args: P.args, **kwargs: P.kwargs) -> T:
90 _rename_keyword_argument(
91 old_name, new_name, kwargs, function.__name__, version
92 )
---> 93 return function(*args, **kwargs)
File ~\anaconda3\lib\site-packages\polars\io\database.py:122, in read_database(query, connection, partition_on, partition_range, partition_num, protocol, engine)
119 engine = "connectorx"
121 if engine == "connectorx":
--> 122 return _read_sql_connectorx(
123 query,
124 connection,
125 partition_on=partition_on,
126 partition_range=partition_range,
127 partition_num=partition_num,
128 protocol=protocol,
129 )
130 elif engine == "adbc":
131 if not isinstance(query, str):
File ~\anaconda3\lib\site-packages\polars\io\database.py:153, in _read_sql_connectorx(query, connection_uri, partition_on, partition_range, partition_num, protocol)
148 except ImportError:
149 raise ImportError(
150 "connectorx is not installed. Please run `pip install connectorx>=0.3.1`."
151 ) from None
--> 153 tbl = cx.read_sql(
154 conn=connection_uri,
155 query=query,
156 return_type="arrow2",
157 partition_on=partition_on,
158 partition_range=partition_range,
159 partition_num=partition_num,
160 protocol=protocol,
161 )
162 return from_arrow(tbl)
File ~\anaconda3\lib\site-packages\connectorx\__init__.py:257, in read_sql(conn, query, return_type, protocol, partition_on, partition_range, partition_num, index_col)
254 except ModuleNotFoundError:
255 raise ValueError("You need to install pyarrow first")
--> 257 result = _read_sql(
258 conn,
259 "arrow" if return_type in {"arrow", "polars"} else "arrow2",
260 queries=queries,
261 protocol=protocol,
262 partition_query=partition_query,
263 )
264 df = reconstruct_arrow(result)
265 if return_type in {"polars", "polars2"}:
RuntimeError: Timed out in bb8
I have the same issue. Even with trusted_connection=true
it seems like it cannot properly use the company Microsoft login
I am the owner of 509, which was actually related to a different database.
But I came across this issue with a mssql database in Azure, where I could successfully connect before I upgraded my connectorx version.
I pinned the version to
connectorx==3.1.0
And then I don't get the timeout error.
Oh, thanks for the info @mavestergaard -- Can you clarify what version you mean, though? Do you mean 0.3.1, possibly? I see the current version of connectorx at 0.3.2. When I try this with 0.3.1, I still get the timeout error. Notably, as is said in #233, I don't see some debug logs indicating that trusted connection is working.
In [1]: import os
...: os.environ["RUST_LOG"]="connectorx=debug,connectorx_python=debug"
...: import connectorx as cx
In [2]: server="..."; port=...; database="..."
In [3]: query="SELECT TOP 1 * FROM ..."; conn=f"mssql://{server}:{port}/{database}?encrypt=true&trust
...: ed_connection=true"
In [4]: df = cx.read_sql(conn, query)
[2023-09-01T14:17:06Z DEBUG connectorx::pandas] Protocol: binary
[2023-09-01T14:17:06Z DEBUG connectorx::dispatcher] Run dispatcher
[2023-09-01T14:17:06Z DEBUG connectorx::dispatcher] Fetching metadata
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-4-2f60f68fff90> in <cell line: 1>()
----> 1 df = cx.read_sql(conn, query)
~/.conda/envs/Inovalon/lib/python3.10/site-packages/connectorx/__init__.py in read_sql(conn, query, return_type, protocol, partition_on, partition_range, partition_num, index_col)
222 raise ValueError("You need to install pandas first")
223
--> 224 result = _read_sql(
225 conn,
226 "pandas",
RuntimeError: Timed out in bb8
I also still have the issue at connectorx==0.3.1
Hello,
I encountered the same issue (RuntimeError: Timed out in bb8
) while attempting to fetch data from a Microsoft SQL Server into a DataFrame using the following Python script:
import connectorx as cx
import polars as pl
query = make_filtered_query(min_date, max_date)
conn = f'mssql://{username}:{parse.quote_plus(password)}@{host}:{port}/{db_name}?trusted_connection=true&encrypt=true'
query = "SELECT * from table"
pl_df = cx.read_sql(conn, query)
# Polar was also throwing the same exception
# pl_df = pl.read_database_uri(query, conn)
However, I was able to resolve the issue and successfully read the database using the modified code below (notably, I used the parameters ?trusted_connection&encrypt
instead of ?trusted_connection=true&encrypt=true
):
import connectorx as cx
import polars as pl
query = make_filtered_query(min_date, max_date)
conn = f'mssql://{username}:{parse.quote_plus(password)}@{host}:{port}/{db_name}?trusted_connection&encrypt'
query = "SELECT * from table"
pl_df = cx.read_sql(conn, query)
# Now, fetching data into a Polar DataFrame is successful as well
# pl_df = pl.read_database_uri(query, conn)
This code was executed using python==3.10.12
, connectorx==0.3.2
, and polars==0.19.11
.
I couldn't find any documentation or reference for this particular solution (omitting =true
for boolean parameters), so I'm sharing it here in the hope that it might assist others facing a similar issue.
Thanks @adeljaouen ! Unfortunately, that solution does not work for me. I suspect part of the reason it works for you is that you are providing a username/password in the connection string directly, so I don't think your solution actually needs to use kerberos trusted connection certificates. In my case, where I am relying on kerberos, I get the timed out error in both the =true
and the no =true
settings. Thanks regardless, though!