MSSQL to AWS Redshift data transfer tool written in Python 3.7
- Introduction
- How this tool works
- How to install and run the tool
- How to setup a new MSSQL to Redshift data migration project
This tool is able to migrate data from your MSSQL Database to AWS Redshift. It consumes arguments defining:
--databasename
or-dn
( the name of the database with the tables in MSSQL you wish to migrate over, this argument needs to be aligned with the values in the column DatabaseName inside the configuration table MSSQL_to_Redshift.mngmt.ControlTable )--schemaname
or-sn
( the name of the database schema with the tables in MSSQL you wish to migrate over, this argument needs to be aligned with the values in the column SchemaName inside the configuration table MSSQL_to_Redshift.mngmt.ControlTable )--generated_csv_files_target_directory
or-td
( the local folder where you wish to store your .csv files, if the folder not exists, it will be created for you during the runtime)--dryrun
or-dr
(True
|False
allowed ,this argument let's you run a dry run for testing purposes, filtering SQL Server data to 0 rows)
To install:
git clone https://www.github.com/datahappy1/mssql_to_redshift_data_transfer_tool mssql_to_redshift_data_transfer_tool
cd c:\mssql_to_redshift_data_transfer_tool
- create and activate a virtual environment
- download Windows Postgres Driver from https://www.enterprisedb.com/downloads/postgres-postgresql-downloads
pip3 install -r requirements.txt
- run the database build script using
mssql_database_buildscript.sql
located here
Setup these environment variables:
odbc_mssql_dsn_name
- mandatory MSSQL DSN name for PyODBCodbc_mssql_uid
- optional MSSQL DSN user name, not needed if you set the user in Windows ODBC connectionodbc_mssql_pwd
- optional MSSQL DSN password, not needed if you set the password in Windows ODBC connectionaws_access_key_id
-mandatory AWS Access key IDaws_secret_access_key
-mandatory AWS Secret access keyredshift_host
-mandatory AWS Redshift host name ( for example redshift-cluster-1.xxx.us-east-1.redshift.amazonaws.com )redshift_port
-mandatory AWS Redshift port ( integer )redshift_user
-mandatory AWS Redshift user nameredshift_pass
-mandatory AWS Redshift password
You need to set the PYTHONPATH like this:
set PYTHONPATH=%PYTHONPATH%;C:\mssql_to_redshift_data_transfer_tool\
A permanent solution is to:
- Go to the Windows menu, right-click on “Computer” and select “Properties”
- From the computer properties dialog, select “Advanced system settings” on the left
- From the advanced system settings dialog, choose the “Environment variables” button
- In the Environment variables dialog, click the “New” button in the top half of the dialog, to make a new user variable
- Give the variable name as PYTHONPATH and the value is the path to the code directory. Choose OK and OK again to save this variable
- Now open a cmd Window (Windows key, then type cmd and press Return). Type:
echo %PYTHONPATH%
to confirm the environment variable is correctly set
https://bic-berkeley.github.io/psych-214-fall-2016/using_pythonpath.html
You need to set the PYTHONPATH like this:
export PYTHONPATH=$PYTHONPATH:/home/your_user/your_git_projects/mssql_to_redshift_data_transfer_tool/
A permanent solution is to:
- Open your favorite terminal program
- Open the file ~/.bashrc in your text editor – e.g. atom ~/.bashrc
- Add the following line to the end:
export PYTHONPATH=/home/my_user/code
- Save the file
- Close your terminal application
- Start your terminal application again, to read in the new settings, and type this:
echo $PYTHONPATH
, you should see something like /home/my_user/code
You need to set the required arguments :
--databasename
or-dn
--schemaname
or-sn
--generated_csv_files_target_directory
ortd
--dryrun
or-dr
Run these commands to execute:
cd mssql_to_redshift_data_transfer_tool
python mssql_to_redshift_data_transfer_tool.py -dn AdventureWorksDW2016 -sn dbo -td C:\mssql_to_redshift_data_transfer_tool\files -dr False
- Setup the database tables with their columns that you need to transfer over to AWS Redshift in the MSSQL Configuration table
MSSQL_to_Redshift.mngmt.ControlTable
Note that this tool's internal database MSSQL_to_Redshift has to be installed at the same host where your source MSSQL databases are located. Another option is to use Linked Servers
- Don't forget to setup the project-scoped settings like the AWS S3 bucket name, the maximum csv filesize in MB, database names, logging level and others using settings.py
Allowed logging level values in settings.py are
'DEBUG'
,'INFO'
,'WARNING'
,'ERROR'
,'CRITICAL'
- Make sure you've got your AWS Redshift tables ready
Redshift tables need to be named exactly like the tables configured in
MSSQL_to_Redshift.mngmt.ControlTable
MSSQL table
- Set the Pythonpath env.variable
- Try running this tool with the
--Dryrun
argument first set totrue
- Now you can go and configure the databases, schemas and table names that will be transferred over to AWS Redshift in the
MSSQL_to_Redshift.mngmt.ControlTable
MSSQL table