Teradata/PyTd

DB API cursor.description information is incorrect

Opened this issue · 1 comments

Python DB API standards defines here what values a DB API compliant driver can return via cursor.description.

From what I observed:

  • precision and scale are wrong. The actual scale value is stored in precision and precision value is stored in internal_size.
  • internal_size attribute on other databases returns the amount of internal storage used by the column. So for example, column of int type would return 4, bigint 8 and so on.
  • internal_size actually has the values that are more appropriate for display_size attribute (which is always null)
  • For SQL interval types, attribute type_code from cursor.description returns str but in reality an instance of teradata.datatypes.Interval is returned.
  • For JSON types, again str is stored in type_code, but the actual values are instances of python dict or list
  • When using tdrest module, none of the metadata except name and type_code are returned. It would be nice to have access to the same information that tdodbc module provides
  • tdrest module doesn't support xml type at all whereas tdodbc has no problems returning xml as python str

Here is a small Python program to reproduce above observations:

import teradata
from collections import OrderedDict

td_types = OrderedDict([
( "Char5"      , "cast('abc'  as char(5))"         ),
( "VarChar5"   , "cast('abc'  as varchar(5))"      ),

( "Int"        , "cast(123456 as integer)"         ),
( "BigInt"     , "cast(1234567890123 as bigint)"   ),
( "SmallInt"   , "cast(1234   as smallint)"        ),
( "ByteInt"    , "cast(123    as byteint)"         ),

( "Dec52"      , "cast(123.56 as decimal(5,2))"    ),
( "Dec384"     , "cast(123.56 as decimal(38,4))"   ),

( "Date"       , "current_date"                    ),
( "time"       , "current_time"                    ),
( "timestamp0" , "Current_timestamp(0)"            ),
( "timestamp6" , "Current_timestamp(6)"            ),

( "HourToMin"  , "interval '05:06' hour to minute" ),
( "DayToMin"   , "interval '03 05' day to hour"    ),
( "YearToMonth", "interval '05-11' year to month"  ),

( "JsonObj"    , "cast('{\"ABC\":[1,2]}' as json)" ),
( "JsonArray"  , "cast('[1,2]' as json)"           ),
( "xml"        , "cast('<a/>' as xml)"             )])

# with teradata.tdrest.connect(host='xxxx', system='xxxx', username='xxxx', password='xxxx') as conn:
with teradata.tdodbc.connect(system='xxxx', username='xxxx', password='xxxx') as conn:
    with conn.cursor() as csr:

        fmt = '{:<20}|{:<38}|{:<28}|{:>5}|{:>10}|{:>4}|{:>5}|{:^4}'

        print(fmt.format('name','py type','csr type','dsize','isize','prec','scale','null'))
        print(fmt.replace('{:','{:-').format(*['']*8))

        for name, expr in td_types.items():
            csr.execute('SELECT {} AS Col_{}'.format(expr,name))
            desc = csr.description[0]
            col  = csr.fetchone()[0]
            line = [desc[0], type(col)] + list(desc[1:])
            print(fmt.format(*[str(c) for c in line]))

I should add that the integral SQL data types are better represented by python int (also long in python2) and python float to represent SQL real data types instead of current implementation that uses decimal.Decimal for all numeric data types.