/PandasSqlWrapper

Provides upsert and schema updating capabilities and wraps basic functionality expected when communicating between dataframes and sql tables.

Primary LanguagePython

PandasSQLWrapper

Provides upsert and schema updating capabilities when writing dataframes to existing sql tables. This also serves as a wrapper for basic functionality such as creating new tables from dataframes, removing tables, updating table schemas, and listing tables/schemas. Currently works with mysql and postgres databases.

Installation / Dependencies

Dependencies: pandas, numpy, sqlalchemy, pymysql, pg8000

pip install pandas-sql-wrapper

Getting Started

Create an instance of PandasSQLWrapper supplying a database connection configuration. Either supply a dict or a string path to a json file as shown below.

sql_data = PandasSQLWrapper({
    host: '<host name>',
    db: '<database name>',
    user: '<database username>',
    password: '<database password>'
})

sql_data = PandasSQLWrapper(configured_from='sql_config.json')

Additional Options

sql_data = PandasSQLWrapper(
    configured_from='sql_config.json',
    postgres=False, # set to true if connecting to a postgres database
    verbose=False, # set to true for console logging of sql actions
    echo=False # set to true for sql database to communicate back performed actions
)

Update Table

Performs an upsert on a sql table and updates table schema by adding columns if necessary. Additionally, user can specify the option to remove rows and columns as part of the update. NOTE: The upsert is implemented differently depending on whether the database is mysql or postgres. For mysql, upsert is done using REPLACE INTO. For postgres, upsert is done using INSERT INTO ... ON CONFLICT.

    sql_data.update_table(
        '<table name>',
        df,
        permit_deletes=False # set to True if it is ok to delete rows or columns from sql table not found in dataframe
    )

Other Helper Functions

    # Create a new table
    sql_data.to_new_table(
        '<table name>',
        df
    )

    # Remove a table
    sql_data.remove_table(
        '<table name>'
    )

    # Get a table from database as a dataframe
    df = sql_data.get_table(
        '<table name>',
        cols=['*'], # provide an array of column names as strings to select only the ones you want
        limit=<int> # get only the first n rows from table
    )

    # List all tables in database
    sql_data.all_tables()

    # Add / Remove columns from a table
    sql_data.add_column(
        '<table name>',
        '<col name>',
        '<data type>' # must be an acceptable sql data type
    )

    sql_data.remove_column(
        '<table name>',
        '<col name>'
    )

    # Custom SQL query
    sql_data.query(
        '<sql expression or statement>'
    )