datafold/data-diff

[SQL Server] TRIM() is not available in versions of sql server prior to 17

MiConnell opened this issue · 1 comments

Describe the bug
data diff runs a query with TRIM() but sql server <17 does not support this.

Make sure to include the following (minus sensitive information):
command:

data-diff -d  <redshift_uri>   <schema>.<table> <mssql_uri>   dbo.<TABLE>   -k <KEY_COLUMN> 

output:

        DEBUG    Available mutual columns: set()                                                                                                              __main__.py:578
         INFO     Diffing using columns: key=('COLUMN',) update=None extra=().                                                                   __main__.py:598
         DEBUG    Running SQL (Redshift): ('schema', 'table')                                                                                       base.py:980
                  SELECT * FROM (SELECT TRIM("column") FROM "schema"."table") AS LIMITED_SELECT LIMIT 64                                         
         INFO     [Redshift] Schema = {'column': String_VaryingAlphanum(_notes=[], collation=None)}                                                 schema.py:44
         DEBUG    Running SQL (MsSQL): ('dbo', 'TABLE')                                                                                                base.py:980
                  SELECT TRIM([COLUMN]) FROM [dbo].[TABLE] ORDER BY 1 OFFSET 0 ROWS FETCH NEXT 64 ROWS ONLY                                         
         ERROR    ('42000', "[42000] [Microsoft][ODBC Driver 18 for SQL Server][SQL Server]'TRIM' is not a recognized built-in function name. (195)            __main__.py:344
                  (SQLExecDirectW)")                                                                                                                                          
Traceback (most recent call last):
  File "/home/ubuntu/ingestion/venv/lib/python3.10/site-packages/data_diff/databases/mssql.py", line 227, in _query_cursor
    return super()._query_cursor(c, sql_code)
  File "/home/ubuntu/ingestion/venv/lib/python3.10/site-packages/data_diff/databases/base.py", line 1173, in _query_cursor
    c.execute(sql_code)
pyodbc.ProgrammingError: ('42000', "[42000] [Microsoft][ODBC Driver 18 for SQL Server][SQL Server]'TRIM' is not a recognized built-in function name. (195) (SQLExecDirectW)")

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/ubuntu/ingestion/venv/bin/data-diff", line 8, in <module>
    sys.exit(main())
  File "/home/ubuntu/ingestion/venv/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/home/ubuntu/ingestion/venv/lib/python3.10/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "/home/ubuntu/ingestion/venv/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/ubuntu/ingestion/venv/lib/python3.10/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/home/ubuntu/ingestion/venv/lib/python3.10/site-packages/data_diff/__main__.py", line 342, in main
    _data_diff(dbt_project_dir=project_dir_override, dbt_profiles_dir=profiles_dir_override, state=state, **kw)
  File "/home/ubuntu/ingestion/venv/lib/python3.10/site-packages/data_diff/__main__.py", line 600, in _data_diff
    segments = [
  File "/home/ubuntu/ingestion/venv/lib/python3.10/site-packages/data_diff/__main__.py", line 601, in <listcomp>
    TableSegment(db, table_path, key_columns, update_column, columns, **options)._with_raw_schema(raw_schema)
  File "/home/ubuntu/ingestion/venv/lib/python3.10/site-packages/data_diff/table_segment.py", line 145, in _with_raw_schema
    schema = self.database._process_table_schema(self.table_path, raw_schema, self.relevant_columns, self._where())
  File "/home/ubuntu/ingestion/venv/lib/python3.10/site-packages/data_diff/databases/base.py", line 1108, in _process_table_schema
    self._refine_coltypes(path, col_dict, where)
  File "/home/ubuntu/ingestion/venv/lib/python3.10/site-packages/data_diff/databases/base.py", line 1127, in _refine_coltypes
    samples_by_row = self.query(
  File "/home/ubuntu/ingestion/venv/lib/python3.10/site-packages/data_diff/databases/base.py", line 996, in query
    res = self._query(sql_code)
  File "/home/ubuntu/ingestion/venv/lib/python3.10/site-packages/data_diff/databases/base.py", line 1248, in _query
    return r.result()
  File "/usr/lib/python3.10/concurrent/futures/_base.py", line 458, in result
    return self.__get_result()
  File "/usr/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
    raise self._exception
  File "/usr/lib/python3.10/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/home/ubuntu/ingestion/venv/lib/python3.10/site-packages/data_diff/databases/base.py", line 1254, in _query_in_worker
    return self._query_conn(self.thread_local.conn, sql_code)
  File "/home/ubuntu/ingestion/venv/lib/python3.10/site-packages/data_diff/databases/base.py", line 1188, in _query_conn
    return apply_query(callback, sql_code)
  File "/home/ubuntu/ingestion/venv/lib/python3.10/site-packages/data_diff/databases/base.py", line 211, in apply_query
    return callback(sql_code)
  File "/home/ubuntu/ingestion/venv/lib/python3.10/site-packages/data_diff/databases/mssql.py", line 229, in _query_cursor
    raise QueryError(e)
data_diff.databases.base.QueryError: ('42000', "[42000] [Microsoft][ODBC Driver 18 for SQL Server][SQL Server]'TRIM' is not a recognized built-in function name. (195) (SQLExecDirectW)")

If possible, please paste these as text, and not a screenshot.

Describe the environment

Ubuntu v0.11.1 Microsoft SQL Server 2016

Hi @MiConnell,

Thank you for trying out data-diff and for taking the time to open this issue. We made a hard decision to sunset the data-diff package and won't provide further development or support. Diffing functionality will continue to be available in Datafold Cloud. Over the past few months, we have completely rewritten the diffing engine in the cloud and solved the fundamental issues with the original algorithm used in the data-diff package. We also substantially improved the experience for SQL Server 2019+. Feel free to take it for a trial or contact us at support@datafold.com if you have any questions.

-Gleb