csv_import_update.py README ======================================================================= 2020-07-17 Evgueni.Antonov@gmail.com Written in Python 3.5.3 https://github.com/StrayFeral/csv_import_update This python script is a quick n dirty tool for import/update of a CSV file into a database. It is a fully generic tool. The intention is that this script could be used by developers, but also by any beginner user, so this README file is written for users of all levels. USAGE: csv_import_update.py [path]<FILENAME>[.csv] [debug|verbose|diff] EXAMPLE: ./csv_import_update.py ./mycsv.csv verbose ./csv_import_update.py mycsv ./csv_import_update.py dir1/dir2/mycsv.csv debug ./csv_import_update.py dir1/dir2/mycsv.csv diff verbose Enables VERBOSE MODE - more messages on screen. debug Enables DEBUG MODE and VERBOSE MODE - more messages on screen and UPDATE and INSERT queries will not be executed. diff Only displays differences betwen the CSV file and the table values. UPDATE and INSERT queries will not be executed. SUPPORTED DATABASES: PostgreSQL, MySQL, Oracle, MS SQL Server TESTED ON: !!!PostgreSQL ONLY!!! FOR EXAMPLE INI FILE, CHECK conf/example.ini HOW IT WORKS: For convenience in this file, these will be used: PYPATH: The path where the executable Python script is located (the main script) CSV: The CSV input file INI: The INI input file (it must have same file name as the CSV, but extension .ini. EXAMPLE: myproject.csv myproject.ini 1) Reads PYPATH/conf/defaults.ini 2) Reads PYPATH/conf/INI 3) Reads the CSV This script is written with large files in mind, so the CSV will be read line-by-line. Each row will be tested if present in the database and if present, values will be compared. If the values to be updated are different (at least one of them), the database row will be updated. If values are the same - this row will be skipped and next CSV row will be read. If no such record is found in the database, the CSV row will be inserted as a new record. For convenience, if something does not work and you need to debug, just run with either "verbose" or "debug" parameter after the filename (but not both! only "verbose" or "debug" -- "debug" turns on the VERBOSE MODE too). They do the same thing - you will see much more messages on the screen, except if in DEBUG MODE, database will not be updated in any way (no INSERT or UPDATE queries will be executed). IMPORTANT: In VERBOSE/DEBUG MODE each SQL query will be shown on screen, before being executed, which is useful for database debugging. *** TESTED ONLY ON POSTGRESQL on Debian linux *** Yep. For everybody else - please excuse, no time, nor resources at the moment, but you can test and report me the result. WHAT ABOUT OTHER DATABASES (aside of PostgreSQL): Database : Python connector class ---------------------------------------- postgresql : psycopg2 mysql : mysql.connector oracle : cx_Oracle sqlserver : pymssql I never tried it, but here is what I know - the table above lists the connector classes you need to have. For each database first you need to install these modules yourself, manually. Use your linux package manager (never tried on Windows). After you install the connector class, open in an editor the script itself (csv_import_update.py) and in the very begining you will see few "import" clauses, which are commented-out (they have "#" in front). Remove the comment symbol for these modules, which you already installed. Also go to the line "def get_database_connection" and few lines down you will see few "if (db == " lines also commented out. Remove the comment for the databases you use. That's it. This is how to enable the use of other databases. THE INI FILES: PYPATH/conf/defaults.ini This file is straight forward. It defines all the possible databases and how to access them. Here is how: [mysql] port = 3306 user = mysql pass = password_goes_here Obviously each [section] is a database name. Then follow settings for default port, user and password. This INI is read first. If you set the same setting in another INI under section [database], it will overwrite these settings. THE OTHER INI FILES: When you run the script, after reading defaults.ini, it expects to find a CSV file (you may put these anywhere and specify the path on the command-line) and an INI file with the same name, but extension .ini, located where this script is located in conf/ subdirectory. The CSV file is straight-forward. This INI file however defines the fine details on how the CSV file should be read, which database to connect to, which table to use and what to use with the CSV data we find or not in this table. All sections in this INI are mandatory to be present. [database] host = myhost.com type = postgresql name = mydatabase table = mytable001 This is straight-forward. Thing is - "type" refers to defaults.ini, so this value is an actual section there. Check conf/example.ini, the provided conf/defaults.ini to see what I talk about. [csv] fields_list = first_name_csv,last_name_csv,account_csv,city_csv delimiter = , quotechar = " Delimiter and quotechar are not mandatory. This must be pretty clear too. "fields_list" simply shows us what fields are in the CSV file and most important ------ the order in which the fields are to be found in the CSV file !! ******IMPORTANT: ANY INI SETTING NAME ENDING IN "_list" INDICATES, THAT ITS VALUE IS A LIST. In this case, "fields_list" value is a CSV value itself. And few other values are lists too. [get_database_data] columns_list = first_name_varchar,last_name_varchar,account_number,city_varchar keys_list = first_name,last_name,city null_equals_to_empty = yes "null_equals_to_empty" is not mandatory, but if present and set to "yes" will enforce when comparing varchar values from CSV and database, if in database the value is NULL to be interpreted as empty string. This section defines what data we are looking from the database table. IMPORTANT: Each CSV file will be imported/updated in only ONE database table. If you need to do import in multiple tables - create CSV and INI file for each table. "columns_list" describes which table columns we are interested in. Data will be pulled from these (SELECT) and will be compared to the currently read CSV row. ********VERY IMPORTANT: "columns_list" in [get_database_data] ALSO DEFINES THE COLUMNS DATA TYPE! In this case we need to know only one thing - should we quote the value or not, so only VARCHAR type is specified as appending "_varchar" to the name of the column. "keys_list" is important - it defines the columns which will be used to create the SQL query predicate also in the UPDATE and INSERT queries. So based on the above, the SELECT will be: SELECT first_name, last_name, account_number, city FROM mytable001 WHERE first_name = 'blah' AND last_name = 'bleh' AND city = 'some city'; [update_database_data] columns_list = account_number increment_column = table_name_goes_here_id This defines how to set the UPDATE and INSERT queries. Note that "columns_list" is a list (it contains "_list" in its name and will be interpreted as a list, even with a single column value), because while currently I need to update only one column, I suspect somebody might need to update multiple columns. To UPDATE multiple columns feature is not tested, but I think it should be fine and working. "increment_column" is not mandatory, but it is very important. Usually in a table we have a primary key column, which we either auto-increment or increment ourselves. If we need to increment it ourselves, here we specify the name of this column, so the INSERT query is created accordingly. So to be precise, what will happen is before creating the INSERT query, a SELECT will be executed to check MAX(column_name)+1 value. This value then will be used in the INSERT. [database_mapping] first_name_csv = first_name last_name_csv = last_name city_csv = city account_csv = account This is the last section. It defines the relation (mapping) of each CSV field to each database table column. *********DEBUGGING********** If you get some exception and you need to debug, just run the script either with "verbose" or "debug" (but not both) parameter, after the filename parameter. "verbose" just enables verbose printing (more messages on screen) and "debug" prevents any INSERT or UPDATE query to be executed. It also enables the "verbose" mode. In this mode, lot will be printed on screen, so better don't log it to file and better use small CSV file with only few rows. However you will see every SQL query being created, printed on screen. Again, UPDATE or INSERTs won't be executed in DEBUG mode, but you will see the queries on the screen anyway. If not in DEBUG mode, you will also see the database results. FINAL WORDS: Hope this script will be useful to you, as it is to me. Importing data in the database is a common task and there are some tools, however I did not found any data-update tools which to be convenient to me, so I wrote one myself. ***EOF
StrayFeral/csv_import_update
Python script to import (INSERT or UPDATE) data in a database table.
Python