mkleehammer/pyodbc

accessToken support?

adw555 opened this issue ยท 43 comments

I was wondering if pyodbc will support connecting to an Azure SQL DB using the AD access token instead of user/password?

Unfortunately I don't think so without some code changes

The ODBC connection string is passed as-is to the driver, so companies can support any keywords they want. Unfortunately MS didn't do the easy, obvious thing and allow an "accessToken=xyz" keyword in the connection string, at least as far as I can tell.

This indicate you need to set it as a connection attribute before connecting. Unfortunately it also shows you need to pass a binary structure (thanks MS!) which pyodbc does not support yet.

I could add this constant and recognize it as a pointer type, but it would probably be better to come up with Python data type that is known to be passed as a pointer (SQL_IS_POINTER). In Python 3, bytes would be a natural choice, but I'm not sure what I'd use in Python 2. buffer? Perhaps ctypes?

SQL_COPT_SS_ACCESS_TOKEN = 1256
token3 = b'...'
pyodbc.connect("Driver={..};Database=xyz", attrs_before={
   SQL_COPT_SS_ACCESS_TOKEN: token  
})

Encoding the token properly would require using the struct module, but I think an example would make it tolerable.

Unfortunately MS didn't do the easy, obvious thing and allow an "accessToken=xyz" keyword in the connection string, at least as far as I can tell.

The access token is usually very long (>1KB, and several KB is not uncommon), can contain arbitrary data, and driver managers may have hardcoded limits on how long a connection string may be. Even pyodbc currently has hardcoded limit of 600 (why? A quick glance through the code doesn't reveal any fixed-length buffers.)

This page describes how to use an access token. I believe bytes would be OK for Python 2 as well, despite the fact that it is not really a string, or perhaps bytearray for 2.6+ . I will look into trying to add this.

Give this a try... https://github.com/v-chojas/pyodbc/tree/connattrs

I implemented the following type mapping:

Python object Value type
buffer(2.x) SQL_IS_POINTER
bytearray(>=2.6) SQL_IS_POINTER
bytes(2.x) string length
bytes(3.x) SQL_IS_POINTER
integers SQL_IS_INTEGER or SQL_IS_UINTEGER depending on sign
unicode string length

This works with AAD access tokens. Example code to expand the token and prepend the length as described on the page linked above, in Python 2.x:

token = "eyJ0eXAiOi...";
exptoken = "";
for i in token:
 exptoken += i;
 exptoken += chr(0);
tokenstruct = struct.pack("=i", len(exptoken)) + exptoken;
conn = pyodbc.connect(connstr, attrs_before = { 1256:bytearray(tokenstruct) });

3.x is only slightly more involved due to annoying char/bytes split:

token = b"eyJ0eXAiOi...";
exptoken = b"";
for i in token:
 exptoken += bytes({i});
 exptoken += bytes(1);
tokenstruct = struct.pack("=i", len(exptoken)) + exptoken;
conn = pyodbc.connect(connstr, attrs_before = { 1256:tokenstruct });

(SQL_COPT_SS_ACCESS_TOKEN is 1256; it's specific to msodbcsql driver so pyodbc does not have it defined, and likely will not.)

Hi! Something new about this issue? Could anyone merge the connattrs to master?

also interested having this feature, any news?

It's waiting for #259 to be merged.

The #259 has been merged, this issue can be closed now.

@adw555 @Pcosmin @hexadite-idan you can try this now in pyODBC 4.0.24.

This particular reply from @v-chojas really helped me to succeed in getting token based access to the database.

Give this a try... https://github.com/v-chojas/pyodbc/tree/connattrs

I implemented the following type mapping:

Python object Value type
buffer(2.x) SQL_IS_POINTER
bytearray(>=2.6) SQL_IS_POINTER
bytes(2.x) string length
bytes(3.x) SQL_IS_POINTER
integers SQL_IS_INTEGER or SQL_IS_UINTEGER depending on sign
unicode string length
This works with AAD access tokens. Example code to expand the token and prepend the length as described on the page linked above, in Python 2.x:

token = "eyJ0eXAiOi...";
exptoken = "";
for i in token:
 exptoken += i;
 exptoken += chr(0);
tokenstruct = struct.pack("=i", len(exptoken)) + exptoken;
conn = pyodbc.connect(connstr, attrs_before = { 1256:bytearray(tokenstruct) });

3.x is only slightly more involved due to annoying char/bytes split:

token = b"eyJ0eXAiOi...";
exptoken = b"";
for i in token:
 exptoken += bytes({i});
 exptoken += bytes(1);
tokenstruct = struct.pack("=i", len(exptoken)) + exptoken;
conn = pyodbc.connect(connstr, attrs_before = { 1256:tokenstruct });

(SQL_COPT_SS_ACCESS_TOKEN is 1256; it's specific to msodbcsql driver so pyodbc does not have it defined, and likely will not.)

Listed below you will find an alternative implementation for creating the tokenstruct. It takes a (Python 3) string as input. This works for Python 3.5+, otherwise you will have to remove the type annotations or convert them to comments.

As you can see, I changed the struct.pack parameter to "<i" following the line of reasoning that if you encode the characters Little-Endian, you will probably want to code the length as well Little-Endian.

def str2mswin_bstr(value: str) -> bytes:
    """Convert a string to a (MS-Windows) BSTR.

    See https://github.com/mkleehammer/pyodbc/issues/228#issuecomment-319190980
    for the original code (although the input in that example are bytes,
    assumed to contain characters in the range 0..255).  It appears the string
    is converted to an MS-Windows BSTR in 'Little-endian' format.

    Please note that neither this routine (nor the original code) will produce
    correct output if any code point in the input string is not in the
    ISO/IEC 10646 Basic Multilingual Plane (BMP), assuming the resulting
    structure is expected to hold UCS-2 characters.

    See https://docs.microsoft.com/en-us/openspecs/windows_protocols/ms-dtyp\
       /692a42a9-06ce-4394-b9bc-5d2a50440168
    for more info on BSTR.  It does not state however what kind of encoding
    it contains.

    See https://en.wikipedia.org/wiki/Universal_Coded_Character_Set for info
    on UCS-2.

    :param value: the string to convert
    :return: the converted value
    """
    # The original code from the provided github URL applies 'Little-endian'
    # to the character conversion in a python for-loop.
    # The following is much faster and works safely for the whole ASCII range.
    # BTW, there is no Byte Order Mark (BOM) inserted when specifying an
    # explicit endianness.
    encoded_bytes = value.encode("utf_16_le")
    # The original code applies native endianness for encoding the length
    # prefix. My working assumption that also the length should be encoded
    # 'Little-endian', regardless of the CPU (and OS) this code is running on.
    return struct.pack("<i", len(encoded_bytes)) + encoded_bytes

In due time, I will provide this function as a small utility package in PyPI.

There is no guarantee that values in an access token will be ASCII.
The correct specification of the token is in the TDS protocol:

https://docs.microsoft.com/en-us/openspecs/windows_protocols/ms-tds/f40cca29-29b8-43ec-b9ff-2f8682486c29

The client then generates and sends a tokenless Federated Authentication Token message that contains binary authentication data that is generated by the federated authentication library.

https://docs.microsoft.com/en-us/openspecs/windows_protocols/ms-tds/827d9632-2957-4d54-b9ea-384530ae79d0

Ooh, that's interesting!

First of all - for completeness' sake - I regrettably have to withdraw my statement that I got it working. I didn't check thoroughly enough. Still working on it...

But regarding your remark about binary authentication data, if the token could be binary, how is it possible that MSIAuthentication provides me a token (using Python 3.6) of type str?

I use the following code fragment to get the token that I'm passing as the value parameter in the function listed above:

    credentials = MSIAuthentication(resource="https://database.windows.net/")
    access_token = credentials.token["access_token"]

access_token has type str (not bytes). I think it is impossible to encode an arbitrary binary in a py3 str type.

[But perhaps I'm using the wrong code to obtain the token...]

The fact that an access-token-producing API is only producing ASCII tokens does not mean that consumers of tokens can assume that they are, given that this is a very generic authentication method. Or put another way, ASCII-only is just a subset of binary.

Also, note that msodbcsql17 since 17.3 supports MSI authentication itself:

https://docs.microsoft.com/en-us/sql/connect/odbc/using-azure-active-directory

(Use Authentication=ActiveDirectoryMSI in the connection string, and optionally for a user-assigned identity, set the UID to the object ID.)

The fact that an access-token-producing API is only producing ASCII tokens does not mean that consumers of tokens can assume that they are, given that this is a very generic authentication method. Or put another way, ASCII-only is just a subset of binary.

In fact you are absolutely right about that. The producing API I am using (i.e. MSIAuthentication) returns a str however, which means that I have to encode the returned string into a binary blob if I change my conversion function into something that consumes bytes instead of str. Looking at the contents of the string (looks a bit like a base-64 encoding) my guess is that indeed 'ascii' is probably the best encoding to choose.

And this approach is working now (in a (Linux) Docker container)!

Having said all this, the earlier listed conversion function can be replaced by the following one, which is about a factor of 13 faster, compared to the original code:

import struct
from itertools import chain, repeat

def bytes2mswin_bstr(value: bytes) -> bytes:
    """Convert a sequence of bytes into a (MS-Windows) BSTR (as bytes).

    See https://github.com/mkleehammer/pyodbc/issues/228#issuecomment-319190980
    for the original code.  It appears the input is converted to an
    MS-Windows BSTR (in 'Little-endian' format).

    See https://docs.microsoft.com/en-us/openspecs/windows_protocols/ms-dtyp\
       /692a42a9-06ce-4394-b9bc-5d2a50440168
    for more info on BSTR.

    :param value: the sequence of bytes to convert
    :return: the converted value (as a sequence of bytes)
    """
    # The original code from the provided github URL applies 'Little-endian'
    # to the byte conversion in a python for-loop.
    # The following is code is assumed to be faster.
    encoded_bytes = bytes(chain.from_iterable(zip(value, repeat(0))))
    # The original code applies native endianness for encoding the length
    # prefix. My working assumption that also the length should be encoded
    # 'Little-endian', regardless of the CPU (and OS) this code is running on.
    return struct.pack("<i", len(encoded_bytes)) + encoded_bytes

Do you agree that also the length should be encoded 'Little-endian', regardless of the CPU (and OS) this code is running on?

Also, note that msodbcsql17 since 17.3 supports MSI authentication itself:

https://docs.microsoft.com/en-us/sql/connect/odbc/using-azure-active-directory

(Use Authentication=ActiveDirectoryMSI in the connection string, and optionally for a user-assigned identity, set the UID to the object ID.)

I have not tried this out, nor do I intend to (a bit frightened by the Note about MacOS and Linux support), now that I have it working.

But thanks again for this solution and your support! :-)

Do you agree that also the length should be encoded 'Little-endian', regardless of the CPU (and OS) this code is running on?

The only platforms that the ODBC Driver for SQL Server is available on are LE.

I am having issues connecting via pyodbc with app-registration credentials.

import pyodbc
import struct

# credentials generated by app-registration
USER = "XXXXXXXXXXXXXXXXXX"
PW = "XXXXXXXXXXXXXXXXXX"

database_url = "https://database.windows.net/"

authority_url = "https://login.microsoftonline.com"
tenantId = "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
authority_url = authority_url + "/" + tenantId

context = adal.AuthenticationContext(authority_url, api_version=None)

token = context.acquire_token_with_client_credentials(
    database_url,
    USER,
    PW
)
print(token)

tokenb = bytes(token["accessToken"], "UTF-8")

exptoken = b''
for i in tokenb:
    exptoken += bytes({i})
    exptoken += bytes(1)
tokenstruct = struct.pack("=i", len(exptoken)) + exptoken
tokenstruct

SQL_COPT_SS_ACCESS_TOKEN = 1256
CONNSTRING = "DRIVER={};SERVER={};DATABASE={}".format("ODBC Driver 17 for SQL Server", SERVER, DATABASE)

conn = pyodbc.connect(CONNSTRING, attrs_before = { SQL_COPT_SS_ACCESS_TOKEN:tokenstruct })

cursor = conn.cursor()
cursor.execute(QUERY)
row = cursor.fetchone()```

The token is generated (`accessToken` below):

```{'tokenType': 'Bearer', 'expiresIn': 3600, 'expiresOn': '2019-06-27 10:36:58.175894', 'resource': 'https://database.windows.net/', 'accessToken': 'XXXXXXXXXXXXXXXXXXXXXXXXXX', 'isMRRT': True, '_clientId': 'XXXXXXXXXXXXXXXXXXXXXX', '_authority': 'https://login.microsoftonline.com/XXXXXXXXXXXXXXX'}

But I get this error when trying to make connection:

 conn = pyodbc.connect(CONNSTRING, attrs_before = { SQL_COPT_SS_ACCESS_TOKEN:tokenstruct })
pyodbc.InterfaceError: ('28000', "[28000] [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Login failed for user ''. (18456) (SQLDriverConnect)")

or, from a Windows machine:

    conn = pyodbc.connect(CONNSTRING, attrs_before = { SQL_COPT_SS_ACCESS_TOKEN:tokenstruct })
pyodbc.InterfaceError: ('28000', "[28000] [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Login failed for user 'NT AUTHORITY\\ANONYMOUS LOGON'. (18456) (SQLDriverConnect); [28000] [Microsoft][ODBC Driver 17 for SQL
Server][SQL Server]Login failed for user 'NT AUTHORITY\\ANONYMOUS LOGON'. (18456)")

In my case, the solution was explained here, but basically, my SQL admin had to add permissions for the app-registration to the specific database that I was trying to access.

There is lot of communication happened in the thread, however, it is very difficult to come to conclusion based on the above errors, here you go a full working code:

Python 2.x:

Using sqlalchemy

from sqlalchemy import create_engine
from flask import Flask
import urllib
import requests
import struct

Retrieve token

headers = {'Metadata': 'true'}
params = {('api-version', '2018-02-01'), ('resource', '$your_token_resouce')}

response = requests.get('http://169..../metadata/identity/oauth2/token', headers=headers, params= params)

token = response.json().get('access-token')
token = token.encode('utf-8')
exptoken = ""
for i in token:
exptoken += i
exxptoken += chr(0)
tokenstruct = struct.pack("=i", len(exptoken)) + exptoken;

CONNSTRING = "DRIVER={"+yourdriver+"};SERVER="+your_server+"DATABASE="+your_database+";Authentication=ActiveDirectoryMsi"
params = urllib.quote_plus(CONNSTRING)
db = create_engine('mssql+pyodbc:///?odbc_connect=%s" %params, connect_args={'attrs_before': {1256: bytearray(tokenstruct)})
conn = db.connect()
result = conn.execute("your sql")
rows = [row[0] for row in result]
print rows


CONNSTRING is main issue, and you need to carefully construct it

you can implement it using pyodbc as well, please message if you still have problems

I have tried to do a variant of the preceding. First in a terminal, I log in to our Azure environment. Then I create a python script whose purpose is to access one of our Azure SQL db's and do some processing.

Since we have multi-factor auth I need a token as well as user credentials. The python script begins by executing a CLI command to get a token.

command = "az account get-access-token --resource https://xxx.database.windows.net"
token = subprocess.check_output(command, shell=True)

The access token that gets returned is a <class 'bytes'> with a length of 1942. Since it's a 'bytes' class I didn't think I'd have to further encode it? That said, if I take the access token returned by 'az account get-access-token ...' and pass it into the following the process simply times out.

driver= 'ODBC Driver 17 for SQL Server'
server = 'xxx.database.windows.net'
database = 'tenant_stats'
username = '{xxx@xxx}'
password = '{.. xxx ..}'
url = 'DRIVER={' + driver + '};SERVER=' + server + ';DATABASE=' + database + ';UID=' + username + ';PWD=' + password + ';Authentication=ActiveDirectoryPassword;'

SQL_COPT_SS_ACCESS_TOKEN = 1256

conn = pyodbc.connect(url, attrs_before = { SQL_COPT_SS_ACCESS_TOKEN:token })
cursor = conn.cursor()

I seem to be doing what has been suggested except for the tokenstruct creation. Any thoughts or guidance would be appreciated.

except for the tokenstruct creation

Yes, you need to pass the token in the correct format to the driver, which includes prepending a length field and expanding the bytes with 0-padding if necessary.

@v-chojas Yeah, I tried this from @michaelcapizzi.:

tokenb = bytes(token, "UTF-8")

exptoken = b''
for i in tokenb:
    exptoken += bytes({i})
    exptoken += bytes(1)
tokenstruct = struct.pack("=i", len(exptoken)) + exptoken
tokenstruct

SQL_COPT_SS_ACCESS_TOKEN = 1256

conn = pyodbc.connect(url, attrs_before = { SQL_COPT_SS_ACCESS_TOKEN:tokenb })

cursor = conn.cursor()

It ends the same way ie. login timeout

Check that your access token structure is in the correct format, you can hexdump it to verify; e.g. if it was 1942 bytes before the expansion, and began with "eyJ0e...", then it should be 3884 after and the token structure should start with 2C 0F 00 00 65 00 79 00 4A 00 30 00 65 00 ...

Also, do not specify UID, PWD, nor Authentication when using an access token - the access token is the credential to access the database, and the only one you need.

@sivagopal I have the access token. I use the following to get it:

command = "az account get-access-token --resource https://ossrdbms-aad.database.windows.net"
response = subprocess.check_output(command, shell=True)
token = json.loads(response)['accessToken']
print(len(token))

The length of the 'reponse' is 1942. I see that the 'response' is an object with an attribute 'accessToken' but also others such as 'tenant' and 'expiresOn'. I've assumed I just need the 'accessToken'. The length of 'token' is 1738 and it begins with 'eyJ0...' .

@v-chojas after the expansion loop the length is 3480. So:

1738 
eyJ0eXAiOiJK ...

And after expansion:

3480 
b'\x94\r\x00\x00e\x00y ...

Does that look right? In either case the login fails. If I don't include UID/PSSWD the error is as follows which looks like it doesn't recognize the user(?!):

InterfaceError: ('28000', "[28000] [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Login failed for user ''. (18456) (SQLDriverConnect)")

This seems very difficult for what I expected to be straightforward. I greatly appreciate
the help :-)

assumed I just need the 'accessToken'

Correct.

after the expansion loop the length is 3480

If the token ('eyJ0...' string) is 1738 bytes long, then expanding it should've resulted in 3476, and prepending the 4-byte length gets you to 3480, so if that's what you meant, it seems right to me.

Use a connection string with only Driver, Server, and Database when connecting using the access token. The token is supplied as a connection attribute so you should not supply any other credentials. The "login failed for user '' " suggests the driver might not be new enough to support access token; an invalid token usually gives "login failed for 'NT AUTHORITY\ANONYMOUS LOGON'". Make sure you are using latest version of the ODBC driver.

@Siva - an example might be good. When I run 'az account get-access...' the object returned has 'tokenType:Bearer', 'tenant', etc. I have only be using 'accessToken' and didn't think the rest was required.

@v-chojas - I have the latest ODBC (17.5.1). But about the length, if the length is 3480 then should that be the value for SQL_COPT_SS_ACCESS_TOKEN instead of 1256?

SQL_COPT_SS_ACCESS_TOKEN = 1256
conn = pyodbc.connect(url, attrs_before = { SQL_COPT_SS_ACCESS_TOKEN:tokenb })

Again, thank you for the time and advice.

1256 is the identifier of the connection attribute, not the length of the access token.

17.5.1 is not the latest but should be new enough for access token support.

I tried changing to "driver= '/usr/local/lib/libtdsodbc.so'" instead of "driver='ODBC Driver 17 for SQL Server'". When using that I get:

OperationalError: ('08S01', '[08S01] [FreeTDS][SQL Server]Unable to connect: Adaptive Server is unavailable or does not exist (20009) (SQLDriverConnect)')

If I use:

exptoken = ""
for i in token:
    exptoken += i <---- errors out here
    exptoken += chr(0)
 tokenstruct = struct.pack("=i", len(exptoken)) + exptoken;

The error says "TypeError: can only concatenate str (not "int") to str".

There is no guarantee that values in an access token will be ASCII.
The correct specification of the token is in the TDS protocol:

https://docs.microsoft.com/en-us/openspecs/windows_protocols/ms-tds/f40cca29-29b8-43ec-b9ff-2f8682486c29

The client then generates and sends a tokenless Federated Authentication Token message that contains binary authentication data that is generated by the federated authentication library.

https://docs.microsoft.com/en-us/openspecs/windows_protocols/ms-tds/827d9632-2957-4d54-b9ea-384530ae79d0

At least for a JWT access token -- which is what Azure AD vends - the UTF_16_LE encoding approach is both correct and very elegant. JWT is comprised of three BASE64 encoded sections joined by dots. Since BASE64 is a strict subset of ASCII and since the dot character is also ASCII, the entire payload will be ASCII.

Hi,
I am getting the below error. Does anyone have any idea?

here is the code I am using
context = adal.AuthenticationContext(authority_url, api_version=None)
token = context.acquire_token_with_client_credentials(
"https://database.windows.net/",
'xxxxxxxxxxxxxxxxxxx',
'xxxxxxxxxxxxxxxxxxx')
SQL_COPT_SS_ACCESS_TOKEN = 1256
connString = "Driver={ODBC Driver 17 for SQL Server};SERVER={xxxxxxxxx};DATABASE={xxxxxxx}"
#get bytes from token obtained
tokenb = bytes(token["accessToken"], "UTF-8")
exptoken = b'';
for i in tokenb:
exptoken += bytes({i});
exptoken += bytes(1);
tokenstruct = struct.pack("=i", len(exptoken)) + exptoken;

conn = pyodbc.connect(connString, attrs_before = { SQL_COPT_SS_ACCESS_TOKEN:tokenstruct});

"
TypeError Traceback (most recent call last)
in
18 tokenstruct = struct.pack("=i", len(exptoken)) + exptoken;
19
---> 20 conn = pyodbc.connect(connString, attrs_before = { SQL_COPT_SS_ACCESS_TOKEN:tokenstruct});

TypeError: connection string too long
"

That suggests you are trying to put the access token in the connection string (which it shouldn't be in), and are also using an older pyODBC version which had that limit on the connection string length.

Hi @michaelcapizzi , thanks a lot for sharing these snippets. I want to know about this:

In my case, the solution was explained here, but basically, my SQL admin had to add permissions for the app-registration to the specific database that I was trying to access.

I also need to convince my DBA about setting this up. They're insisting that there's no such thing - that there's no linking the registered applications to DBs. How does one do this? Some documentation links, screenshots etc would be very helpful. The MS docs I've seen go deep into regsitering applications and all, but don't show how to link the application to MSSQL.

You need to create a database user that corresponds to the identity of the token; see the docs for CREATE USER FROM EXTERNAL PROVIDE for more info.

In my case, the solution was explained here, but basically, my SQL admin had to add permissions for the app-registration to the specific database that I was trying to access.

How to add permission for the app-registration to the specific database?

Hi @michaelcapizzi , thanks a lot for sharing these snippets. I want to know about this:

In my case, the solution was explained here, but basically, my SQL admin had to add permissions for the app-registration to the specific database that I was trying to access.

I also need to convince my DBA about setting this up. They're insisting that there's no such thing - that there's no linking the registered applications to DBs. How does one do this? Some documentation links, screenshots etc would be very helpful. The MS docs I've seen go deep into regsitering applications and all, but don't show how to link the application to MSSQL.

I am also on the same page. How did you register or solved this issue?

Has anyone ever seen an issue in performance degradation when connecting with an access_token vs username and password? From analysis it seems that pooling of connections is not working correctly on an azure sql database with an access token vs username and password.

assert exptoken == access_token.encode("UTF-16-LE")

Just sayin'

Prepending each byte with a zero byte, in a case where the original text is ASCII text (base64 + dot . is), is equivalent to UTF-16 encoding, where each letter is encoded by 16 bits (2 bytes). The extra byte will be 0 for ASCII characters. The BE or LE means where that "big" byte will be (which is 0 for all ASCII characters) - at the end or not.

Effectively that counter-intuitive loop that I've been seeing everywhere, can be forgotten by just writing

exptoken = token["accessToken"].encode("UTF-16-LE")

in the first place, and not manually to UTF-8 and then add a zero byte to each byte.

Prepending each byte with a zero byte, in a case where the original text is ASCII text (base64 + dot . is), is equivalent to UTF-16 encoding, where each letter is encoded by 16 bits (2 bytes). The extra byte will be 0 for ASCII characters. The BE or LE means where that "big" byte will be (which is 0 for all ASCII characters) - at the end or not.

Yes, that will work provided you're input is ASCII. Note: don't forget to prepend the length of the string. See my str2mswin_bstr() example for the complete solution.

However, as the maintainer states in #228 (comment), there is no guarantee that the input is ASCII. That's why I changed my solution to bytes2mswin_bstr(), listed in #228 (comment).

Effectively that counter-intuitive loop that I've been seeing everywhere, can be forgotten by just writing

There's as much looping in my solution as there is in yours ๐Ÿ˜‰

Yes, that will work provided you're input is ASCII

It's base64 str. It is always ASCII. The struct packing is not solved by encoding in UTF-16-LE, but otherwise if you lose the ugly Python loop that tries to prepare data that's compatible with MSODBC driver (that uses UTF-16-LE internally apparently), and do explicitly what needs to be done (encoding in UTF-16-LE), rather than manually playing with bytes, then it works.

The fact that you are currently seeing base64 strings as tokens is not a guarantee that that is what they will remain. The access token mechanism supports arbitrary binary data.

I am not a maintainer of pyODBC, but I do maintain the official Microsoft ODBC Driver for SQL Server.

Hi, no idea how it worked, but by using redhat's python image in the Dockerfile, the AzureAD login is working for me. I did not need to do any of this extra stuff; it connected to the DB in the same way non-AD connections happen; just with an extra "Authentication=ActiveDirectoryPassword" added in the connection string.

Dockerfile:

FROM registry.access.redhat.com/ubi8/python-38
USER root
RUN curl https://packages.microsoft.com/config/rhel/8/prod.repo -o /etc/yum.repos.d/mssql-release.repo
RUN yum remove unixODBC-utf16 unixODBC-utf16-devel 
ENV ACCEPT_EULA=Y
RUN yum install -y unixODBC unixODBC-devel 
RUN yum download -y msodbcsql17
RUN rpm -Uvh --nodeps msodbcsql17*rpm

# Python dependencies
RUN pip install --upgrade pip setuptools wheel
RUN pip install pyodbc SQLAlchemy pandas

EXPOSE 8000
COPY . /app/
RUN chmod +777 /app
WORKDIR /app

CMD python /app/launch.py

Now I'm not sure if at your end you'll be able to pull this red hat docker image or not - pls try and see.

In the program, credentials are loaded like so:

DRIVER: "{ODBC Driver 17 for SQL Server}"
SERVER: "tcp:<my-database>.database.windows.net,1433"
DATABASE: "officeride-reporting-qa"
UID: "<the org email account>",
PWD: "<the password>",
OTHER: "Encrypt=yes;TrustServerCertificate=no;Connection Timeout=30;Authentication=ActiveDirectoryPassword"

Note the "Authentication=ActiveDirectoryPassword" part at the end.

Code in program, assuming above creds are loaded into a "params" dict:

import sqlalchemy as db
import pyodbc
import urllib.parse
import pandas as pd

def initiateDB():
    dbString =  urllib.parse.quote_plus(r'DRIVER={};SERVER={};DATABASE={};UID={};PWD={};OTHER={}'\
        .format(params['DRIVER'], params['SERVER'], params['DATABASE'], params['UID'], params['PWD'], params['OTHER']))
    conn_str = r'mssql+pyodbc:///?odbc_connect={}'.format(dbString).strip()
    engine = db.create_engine(conn_str,echo=False, pool_recycle=3600)
    return engine

engine = initiateDB()
df = pd.read_sql('table1', engine)
print(df) 

So, somehow in the redhat image it looks like pyodbc is taking care of AD auth under-the-hood, while on using regular python image or ubuntu image like FROM ubuntu:16.04 I was never able to get through. No idea why this difference is there.

I can confirm that that the following works for me with bytes2mswin_bstr as suggested in #228 (comment)

import pyodbc
from azure.identity import AzureCliCredential

credential = AzureCliCredential()

database_token = credential.get_token("https://database.windows.net/.default")

connection_string = "DRIVER={ODBC Driver 18 for SQL Server};SERVER=<your-db-server>.database.windows.net;DATABASE=<your-database>;Encrypt=Yes;"

tokenstruct = bytes2mswin_bstr(database_token.token.encode())
connection = pyodbc.connect(conn_str, attrs_before={1256: tokenstruct})