/tap-mssql

Singer Tap for MS SQL built with Meltano Singer SDK.

Primary LanguagePythonMIT LicenseMIT

tap-mssql

tap-mssql is a Singer tap for mssql. !!! Warning !!! work in progress. It works ok 😐 for full loads.
It works maybe 🤷‍♀️🤷‍♂️ for Incremntal loads.

Built with the Meltano Tap SDK for Singer Taps.

Whats New 🛳️🎉

2023-05-03 Incremental Replication: Equipped with the Singer-SDK documentation on how to implement Incremental Replication and davert0 issue filled with great details I headed off on a coding adventure. There were twists, turns, and backtracking but in the end you can now setup Incremental Replication and it might work. If you are using Meltano here is the documentation to follow to setup Key-based Incremental Replication and manage Incremental Replication State. Skim over the documentation and head off on you own Incremental Replication adventure.

2023-04-26 New HD JSON Schema Types: XML, IMAGE, BINARY, and VARBINARY types have been give definitions. This is Thanks🙏 to Singer-SDK 0.24.0 which allows for JSON Schema contentMediaType and contentEncoding. Currently all binary data types are Base64 encoded before being sent to a tartget. The buzzcutnorman target-mssql and target-postgres are both able to translate them back into SQL data types.

2023-04-19 Upgraded to Meltano Singer-SDK 0.24.0

2023-02-22 Batch Message Can Handle More Data Types: You use batch message when a table contains DECIMAL or NUMERIC columns. Plus better batch message support for TIME, DATE, and DATETIME columns.

2023-02-08 Higher Defined(HD) JSON Schema types: This is my interpretation of how to define MS SQL data types using the JSON Schema. You can give it a try by setting hd_jsonschema_types to True in your config.json or meltano.yml. The buzzcutnorman target-mssql and target-postgres are both able to translate them back into SQL data types.

Installation

Prerequisites

You will need to install the SQL Server Native Driver or ODBC Driver for SQL Server if you plan to use the driver_type of pyodbc. These drivers are not needed when opting to use pymssql.

Installing Microsoft ODBC Driver for SQL Server

Install from GitHub:

pipx install git+https://github.com/BuzzCutNorman/tap-mssql.git

Meltano CLI

You can find this tap at Meltano Hub. Which makes installation a snap.

Add the tap-stackoverflow-sampledata extractor to your project using meltano add :

meltano add extractor tap-mssql --variant buzzcutnorman

Configuration

The simplest way to configure tap-mssql is to use the Meltano interactive configuration.

meltano config tap-mssql set --interactive

You can quickly set configuration options 1 - 7 this way:

  1. dialect: The Dialect of SQLAlchemy
  2. driver_type: The Python Driver you will be using to connect to the SQL server
  3. host: The FQDN of the Host serving out the SQL Instance
  4. port: The port on which SQL awaiting connection
  5. user: The User Account who has been granted access to the SQL Server
  6. password: The Password for the User account
  7. database: The Default database for this connection

WARNING: Do not attempt setting any other configuration options via interactive. Doing so has lead to incomplete configurations that fail when the tap is run.

Options 8 - 15 can be setup via meltano config tap-mssql set. Examples for the most commonly needed configurations options are given below.

When using pyodbc sqlalchemy_url_query.driver passes SQLAlchemny the installed ODBC driver.

meltano config target-mssql set sqlalchemy_url_query.driver "ODBC Driver 18 for SQL Server"

When using pyodbc sqlalchemy_url_query.TrustServerCertificate let SQLAlchemy know whether to trust server signed certificates when connecting to SQL Server.

meltano config target-mssql set sqlalchemy_url_query.TrustServerCertificate yes

The pyodbc driver has added support for a “fast executemany” mode of execution which greatly reduces round trips. You can trun the option on or off by setting sqlalchemy_eng_params.fast_executemany to "True" or "False"

meltano config target-mssql set sqlalchemy_eng_params.fast_executemany "True"

Accepted Config Options

Setting Required Default Description
dialect False None The Dialect of SQLAlchamey
driver_type False None The Python Driver you will be using to connect to the SQL server (either "pyodbc" or "pymssql")
host False None The FQDN of the Host serving out the SQL Instance
port False None The port on which SQL awaiting connection
user False None The User Account who has been granted access to the SQL Server
password False None The Password for the User account
database False None The Default database for this connection
sqlalchemy_eng_params False None SQLAlchemy Engine Paramaters: fast_executemany, future
sqlalchemy_url_query False None SQLAlchemy URL Query options: driver, TrustServerCertificate
batch_config False None Optional Batch Message configuration
start_date False None The earliest record date to sync
hd_jsonschema_types False 0 Turn on Higher Defined(HD) JSON Schema types to assist Targets
stream_maps False None Config object for stream maps capability. For more information check out Stream Maps.
stream_map_config False None User-defined config values to be used within map expressions.
flattening_enabled False None 'True' to enable schema flattening and automatically expand nested properties.
flattening_max_depth False None The max depth to flatten schemas.

A full list of supported settings and capabilities for this tap is available by running:

tap-mssql --about

Configure using environment variables

This Singer tap will automatically import any environment variables within the working directory's .env if the --config=ENV is provided, such that config values will be considered if a matching environment variable is set either in the terminal context or in the .env file.

Usage

You can easily run tap-mssql by itself or in a pipeline using Meltano.

SDK Dev Guide

See the dev guide for more instructions on how to use the SDK to develop your own taps and targets.