/mssql_dataframe

Update, Upsert, and Merge from Python dataframes to SQL Server and Azure SQL database.

Primary LanguagePythonMIT LicenseMIT

mssql_dataframe

Test Status Coverage Status Flake8 Status Bandit Security PyPI Code style: black Build Status Open in Visual Studio Code

A data engineering package for Python pandas dataframes and Microsoft Transact-SQL. It provides more advanced methods for writting dataframes including update, merge, upsert.

  1. Update: updates records in SQL table
  2. Upsert: insert or update records in SQL table
  3. Merge: update, insert, or delete records in SQL table

These more advanced methods are designed to provide more funcationality than is offered by the pandas.DataFrame.to_sql method. pandas.DataFrame.to_sql offers 3 options if the SQL table already exists with the parameter if_exists={'fail', 'replace', 'append'}.

See QUICKSTART for a full overview of functionality.

Initialization and Sample SQL Table

import env
import pandas as pd
from mssql_dataframe import SQLServer

# connect to database using pyodbc
sql = SQLServer(database=env.database, server=env.server)

Update

Records in an SQL table are updated by simply providing a dataframe. By default a match on the SQL table's primary key is required for a record to be updated.

# create demo SQL table
df = sql.create.table(
    table_name = '##mssql_update',
    columns = {'Column1': 'VARCHAR(10)', 'Column2': 'TINYINT', 'PK': 'CHAR(1)'},
    primary_key_column = 'PK'
)

# create a demo dataframe
df = pd.DataFrame({
    'Column1': ['A_Initial', 'B_Initial'],
    'Column2': [1, 2],
}, index=pd.Index(['A', 'B'], name='PK'))

# perform an initial insert
sql.write.insert('##mssql_update', df)

# update records
update_df = pd.DataFrame({
        'Column1': ['A_Updated'],
}, index=pd.Index(['A'], name='PK'))
# update data in the SQL table
update_df = sql.write.update('##mssql_update', update_df)

# validate the result
result = sql.read.table('##mssql_update')
assert result.at['A', 'Column1'] == 'A_Updated'
assert result.at['A', 'Column2'] == 1
assert result.at['B', 'Column1'] == 'B_Initial'
assert result.at['B', 'Column2'] == 2

Merge

Records can be inserted/updated/deleted by providing a dataframe to the merge method. Again the primary key in the SQL table is used by default.

  1. dataframe column value doesn't match SQL column value -> insert record into SQL
  2. dataframe column value matches SQL column value -> update record in SQL
  3. SQL column value not in dataframe column -> delete record in SQL
# create demo SQL table
df = sql.create.table(
    table_name = '##mssql_merge',
    columns = {'Column1': 'VARCHAR(10)', 'Column2': 'TINYINT', 'PK': 'CHAR(1)'},
    primary_key_column = 'PK'
)

# create a demo dataframe
df = pd.DataFrame({
    'Column1': ['A_Initial', 'B_Initial'],
    'Column2': [1, 2],
}, index=pd.Index(['A', 'B'], name='PK'))

# perform an initial insert
sql.write.insert('##mssql_merge', df)

# perform merge
sql.write.merge(
        '##mssql_merge',
        pd.DataFrame.from_records([
                {'Column1': 'C_New', 'Column2': 3, 'PK': 'C'},
                {'Column1': 'B_Updated', 'Column2': 0, 'PK': 'B'},
        ]).set_index('PK')
)

# validate the results
result = sql.read.table('##mssql_merge')
assert 'A' not in result.index
assert result.at['C', 'Column1'] == 'C_New'
assert result.at['B', 'Column1'] == 'B_Updated'
assert result.at['B', 'Column2'] == 0

Upsert

The merge method can be restricted to not delete records in SQL by specifying the upsert flag. Records in SQL are then only inserted or updated.

# create demo SQL table
df = sql.create.table(
    table_name = '##mssql_upsert',
    columns = {'Column1': 'VARCHAR(10)', 'Column2': 'TINYINT', 'PK': 'CHAR(1)'},
    primary_key_column = 'PK'
)

# create a demo dataframe
df = pd.DataFrame({
    'Column1': ['A_Initial', 'B_Initial'],
    'Column2': [1, 2],
}, index=pd.Index(['A', 'B'], name='PK'))

# perform an initial insert
sql.write.insert('##mssql_upsert', df)

# perform upsert
sql.write.merge(
        '##mssql_upsert',
        pd.DataFrame.from_records([
                {'Column1': 'C_New', 'Column2': 3, 'PK': 'C'},
                {'Column1': 'B_Updated', 'Column2': 0, 'PK': 'B'},
        ]).set_index('PK'),
        upsert = True
)

# validate the results
result = sql.read.table('##mssql_upsert')
assert result.at['A', 'Column1'] == 'A_Initial'
assert result.at['A', 'Column2'] == 1
assert result.at['C', 'Column1'] == 'C_New'
assert result.at['B', 'Column1'] == 'B_Updated'
assert result.at['B', 'Column2'] == 0

Installation

pip install mssql-dataframe

Dependancies

pandas: Python DataFrames.

pyodbc: ODBC driver used for executing Transact-SQL statements.

Contributing

See CONTRIBUTING.