sfu-db/connector-x

Microsoft SQL Server Connection with ConnectorX throwing timed out bb8

ryanvdw1997 opened this issue · 7 comments

What language are you using?

Python

What version are you using?

polars version 0.18.15
connectorx version 0.3.1

What database are you using?

Microsoft SQL Server

What dataframe are you using?

Arrow

Can you describe your bug?

It takes a while to run read_database and then ultimately times out

What are the steps to reproduce the behavior?

Just using an MSSQL server connection uri and running read_database with a basic query

Database setup if the error only happens on specific data or data type

N/A

Example query / code
db_str = f"mssql://{username}:{password}@{server}:1433/{database}?driver=SQL+Server&trusted_connection=no"
pl.read_database(query="select * from foo", connection=db_str)

What is the error?

RuntimeError                              Traceback (most recent call last)
Input In [9], in <cell line: 7>()
      5 driver = '{ODBC Driver 17 for SQL Server}'
      6 db_str = f"mssql://{username}:{password}@{server}:1433/{database}?driver=SQL+Server&trusted_connection=no"
----> 7 pl.read_database(query="select * from player", connection=db_str)

File ~\anaconda3\lib\site-packages\polars\utils\deprecation.py:93, in deprecate_renamed_parameter.<locals>.decorate.<locals>.wrapper(*args, **kwargs)
     88 @wraps(function)
     89 def wrapper(*args: P.args, **kwargs: P.kwargs) -> T:
     90     _rename_keyword_argument(
     91         old_name, new_name, kwargs, function.__name__, version
     92     )
---> 93     return function(*args, **kwargs)

File ~\anaconda3\lib\site-packages\polars\io\database.py:122, in read_database(query, connection, partition_on, partition_range, partition_num, protocol, engine)
    119     engine = "connectorx"
    121 if engine == "connectorx":
--> 122     return _read_sql_connectorx(
    123         query,
    124         connection,
    125         partition_on=partition_on,
    126         partition_range=partition_range,
    127         partition_num=partition_num,
    128         protocol=protocol,
    129     )
    130 elif engine == "adbc":
    131     if not isinstance(query, str):

File ~\anaconda3\lib\site-packages\polars\io\database.py:153, in _read_sql_connectorx(query, connection_uri, partition_on, partition_range, partition_num, protocol)
    148 except ImportError:
    149     raise ImportError(
    150         "connectorx is not installed. Please run `pip install connectorx>=0.3.1`."
    151     ) from None
--> 153 tbl = cx.read_sql(
    154     conn=connection_uri,
    155     query=query,
    156     return_type="arrow2",
    157     partition_on=partition_on,
    158     partition_range=partition_range,
    159     partition_num=partition_num,
    160     protocol=protocol,
    161 )
    162 return from_arrow(tbl)

File ~\anaconda3\lib\site-packages\connectorx\__init__.py:257, in read_sql(conn, query, return_type, protocol, partition_on, partition_range, partition_num, index_col)
    254 except ModuleNotFoundError:
    255     raise ValueError("You need to install pyarrow first")
--> 257 result = _read_sql(
    258     conn,
    259     "arrow" if return_type in {"arrow", "polars"} else "arrow2",
    260     queries=queries,
    261     protocol=protocol,
    262     partition_query=partition_query,
    263 )
    264 df = reconstruct_arrow(result)
    265 if return_type in {"polars", "polars2"}:

RuntimeError: Timed out in bb8
lyngc commented

I have the same issue. Even with trusted_connection=true it seems like it cannot properly use the company Microsoft login

I suspect this may be related to #233

Also this is likely another duplicate: #509

I am the owner of 509, which was actually related to a different database.

But I came across this issue with a mssql database in Azure, where I could successfully connect before I upgraded my connectorx version.

I pinned the version to
connectorx==3.1.0

And then I don't get the timeout error.

Oh, thanks for the info @mavestergaard -- Can you clarify what version you mean, though? Do you mean 0.3.1, possibly? I see the current version of connectorx at 0.3.2. When I try this with 0.3.1, I still get the timeout error. Notably, as is said in #233, I don't see some debug logs indicating that trusted connection is working.

In [1]: import os
   ...: os.environ["RUST_LOG"]="connectorx=debug,connectorx_python=debug"
   ...: import connectorx as cx

In [2]: server="..."; port=...; database="..."

In [3]: query="SELECT TOP 1 * FROM ..."; conn=f"mssql://{server}:{port}/{database}?encrypt=true&trust
   ...: ed_connection=true"

In [4]: df = cx.read_sql(conn, query)
[2023-09-01T14:17:06Z DEBUG connectorx::pandas] Protocol: binary
[2023-09-01T14:17:06Z DEBUG connectorx::dispatcher] Run dispatcher
[2023-09-01T14:17:06Z DEBUG connectorx::dispatcher] Fetching metadata
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-4-2f60f68fff90> in <cell line: 1>()
----> 1 df = cx.read_sql(conn, query)

~/.conda/envs/Inovalon/lib/python3.10/site-packages/connectorx/__init__.py in read_sql(conn, query, return_type, protocol, partition_on, partition_range, partition_num, index_col)
    222             raise ValueError("You need to install pandas first")
    223 
--> 224         result = _read_sql(
    225             conn,
    226             "pandas",

RuntimeError: Timed out in bb8
lyngc commented

I also still have the issue at connectorx==0.3.1

Hello,

I encountered the same issue (RuntimeError: Timed out in bb8) while attempting to fetch data from a Microsoft SQL Server into a DataFrame using the following Python script:

import connectorx as cx
import polars as pl
query = make_filtered_query(min_date, max_date)
conn = f'mssql://{username}:{parse.quote_plus(password)}@{host}:{port}/{db_name}?trusted_connection=true&encrypt=true'
query = "SELECT * from table"
pl_df = cx.read_sql(conn, query)
# Polar was also throwing the same exception
# pl_df = pl.read_database_uri(query, conn) 

However, I was able to resolve the issue and successfully read the database using the modified code below (notably, I used the parameters ?trusted_connection&encrypt instead of ?trusted_connection=true&encrypt=true):

import connectorx as cx
import polars as pl
query = make_filtered_query(min_date, max_date)
conn = f'mssql://{username}:{parse.quote_plus(password)}@{host}:{port}/{db_name}?trusted_connection&encrypt'
query = "SELECT * from table"
pl_df = cx.read_sql(conn, query)
# Now, fetching data into a Polar DataFrame is successful as well
# pl_df = pl.read_database_uri(query, conn) 

This code was executed using python==3.10.12, connectorx==0.3.2, and polars==0.19.11.

I couldn't find any documentation or reference for this particular solution (omitting =true for boolean parameters), so I'm sharing it here in the hope that it might assist others facing a similar issue.

Thanks @adeljaouen ! Unfortunately, that solution does not work for me. I suspect part of the reason it works for you is that you are providing a username/password in the connection string directly, so I don't think your solution actually needs to use kerberos trusted connection certificates. In my case, where I am relying on kerberos, I get the timed out error in both the =true and the no =true settings. Thanks regardless, though!