DataBuddy

Do it yourself data integration starter kit. (Treat it as a work of art rather than something useful)

##Version

Name	OS	Platform	data-buddy (GUI)	DataBuddy CLI (command line)
DataBuddy.exe	Windows	32bit	0.3.5 beta	[1.23.9 beta] (https://github.com/QueryCopy/QueryCopy-for-Oracle/releases/tag/v1.23.9)

There are 2 components. Upon "Run" Databuddy GUI (DataBuddy.exe) will kick off Databuddy CLI (qc32\qc.exe) out of process.

##Purpose

It is data integration software used to define technical processes to combine data from different sources. Data is moved across RDBMS borders using CSV files.
DataBuddy facilitates data delivery from multiple relational data sources including Oracle, SQLServer, DB2, SAP Sybase, Informix, MySQL, Infobright, MariaDB, PostgreSQL, TimesTen, and SQLite.
It requires minimal initial configuration and lets you manage data integration process using GUI or command line.
Your data integration processes are stored as session files and can be scripted into your ETL pipelines or used in ad-hoc manner.
Lets you develop Extract-Copy-Load processes to scrub and ingest large, distinct data sets from multiple sources into a unified data warehouse.
Provides structured and ad-hoc access to large datasets.
Databuddy also supports enables seamless, bi-directional integration between all major RDBMSs, such as Oracle, SQLServer, DB2, SAP Sybase, Informix, MySQL, Infobright, MariaDB, and PostgreSQL. In addition, it supports self-service data extraction, preparation, and cleansing by database developers.

##Other scripts

Oracle -> Redshift data loader
PostgreSQL -> Redshift data loader
MySQL -> Redshift data loader
Oracle -> S3 data loader
CSV -> Redshift data loader

##Audience

Database developers, ETL developers, Data Integrators.

##Designated Environment Pre-Prod (UAT/QA/DEV)

##Databases supported

###SQL

Database	GUI (DataBuddy)	Command line (DataBuddy CLI)
DB2	yes	yes
Informix	yes	yes
MariaDB	yes	yes
MySQL	yes	yes
Infobright	yes	yes
Oracle	yes	yes
PostgreSQL	yes	yes
SQLite	yes	yes
SQLServer	yes	yes
Sybase	yes	yes
TimesTen	yes	yes

###noSQL

Database	GUI (data-buddy)	Command line (DataBuddy CLI)
MongoDB	yes	yes

##Data Tools

Name	GUI (data-buddy)	Command line (DataBuddy CLI)
CURL	yes	yes

##File formats

###SQL

Database	DDL extract	CSV extract	CSV load
DB2		yes	yes
Informix		yes	yes
MariaDB		yes	yes
MySQL		yes	yes
Oracle	Table	yes	yes
PostgreSQL		yes	yes
SQLite		yes	yes
SQLServer		yes	yes
Sybase		yes	yes
TimesTen		yes	yes

DDL extract works only for Oracle tables.

###noSQL

Database	CSV extract	CSV load	JSON extract	JSON load
MongoDB	yes	yes	yes	yes

###Data Tools

Database	CSV extract	CSV load	JSON extract	JSON load
CURL	yes		yes

##Components

GUI - data-buddy (wxPython, PyInstaller).
Command line -DataBuddy CLI (Python, PyInstaller).

##Features:

###Front end, GUI (data-buddy).

Session management.

###Command line (DataBuddy CLI):

Pipeline types

E Extract
L Load
C Copy

###SQL

Database	Table	Partition	Subpartition	CSV Files	Queries
DB2	E/L/C			L/E	E/C
Informix	E/L/C			L/E	E/C
MariaDB	E/L/C			L/E	E/C
MySQL	E/L/C			L/E	E/C
Oracle	E/L/C	E/L/C	E/L/C	L/E	E/C
PostgreSQL	E/L/C	E/L/C	E/L/C	L/E	E/C
SQLite	E/L/C			L/E	E/C
SQLServer	E/L/C	E/L/C		L/E	E/C
Sybase	E/L/C			L/E	E/C
TimesTen	E/L/C			L/E	E/C
###noSQL
Database	Collection	CSV Files	JSON Files	Queries
---------	--------------	-----	-----	-----	---
MongoDB	E/L/C	E/L	E/L	E/C

##Tools used to extract, load, and query data

DbShell - queries target and source for table metadata.
Spooler - extracts data to temp file from source.
Loader - loads temp file to target using bulk loader.

####SQL stores.

DB family	Database	Spooler	Loader	DbShell
DB2	DB2 Advanced Enterprise Server	db2.exe	db2.exe	db2.exe
DB2 Advanced Workgroup Server	db2.exe	db2.exe	db2.exe
DB2 Developer Edition	db2.exe	db2.exe	db2.exe
DB2 Express	db2.exe	db2.exe	db2.exe
DB2 Express C	db2.exe	db2.exe	db2.exe
DB2 Enterprise Server	db2.exe	db2.exe	db2.exe
DB2 Workgroup Server	db2.exe	db2.exe	db2.exe
MySQL	MySQL	mysql.exe	mysql.exe	mysql.exe
Infobright	mysql.exe	mysql.exe	mysql.exe
MariaDB	mysql.exe	mysql.exe	mysql.exe
Informix	Informix IDS	dbaccess.exe	dbaccess.exe	dbaccess.exe
Informix Innovator C	dbaccess.exe	dbaccess.exe	dbaccess.exe
Oracle	Oracle 12c	sqlplus.exe	sqlldr.exe	sqlplus.exe
Oracle 11g	sqlplus.exe	sqlldr.exe	sqlplus.exe
Exadata	sqlplus.exe	sqlldr.exe	sqlplus.exe
Oracle XE	sqlplus.exe	sqlldr.exe	sqlplus.exe
PostgreSQL	PostgreSQL	psql.exe	psql.exe	psql.exe
SQL Server	SQL Server Enterprise	sqlcmd.exe	sqlcmd.exe	sqlcmd.exe
SQL Server Express	sqlcmd.exe	sqlcmd.exe	sqlcmd.exe
SAP Sybase	Sybase SQL Anywhere	dbisql.com	dbisql.com	dbisql.com
SAP Sybase ASE	dbisql.com	dbisql.com	dbisql.com
Sybase IQ	dbisql.com	dbisql.com	dbisql.com
TimesTen	TimesTen	ttBulkCp.exe	ttBulkCp.exe	ttIsql.exe
SQLite	SQLite	sqlite3.exe	sqlite3.exe	sqlite3.exe

####noSQL stores.

Database	Spooler	Loader	DbShell
MongoDB	mongoexport.exe	mongoimport.exe	mongo.exe

####Data Tools.

Database	Spooler	Loader	DbShell
CURL	curl.exe	n/a	n/a

What it doesn't do

It does not create target table.
It does not pipe data (it extracts into a file then loads).
It should not be used in Prod. Trial/ad-hoc use only.

##Implementation

Written using Python (command line) and wxPython (GUI).
Compiled with PyInstaller

##Configuration

modify default host_map.py:
set your 'source' and 'target' dirs for local clients for each database.

 'host_list': {0: {'db_env': {'ORA11G': {'source': 'C:\\app\\alex_buz\\product\\11.2.0\\dbhome_2\\BIN',
                                         'target': 'C:\\app\\alex_buz\\product\\11.2.0\\dbhome_2\\BIN'},

you are good to go

##Execution

data-buddy.exe

from Windows command line or File Explorer

#Target object Truncate

it will popup with warning window every time you try to run DataBuddy CLI truncating you target object.

#Templates v.s. free argument entry ##Pros

all templates are tested with presets

##Cons

user has to create new session if new argument has to be added/removed

There's no way to add/remove args to your choosing. Argument combos come as templates which you select in "Create new session window"

#TODO

##Limitations

tested to run only on Windows for now (even thou it's wxPython)
CSV dump files are uncompressed (will add zip compression as option)
physical copy is done on Windows. Only Oracle copy can be executed on Linux (bash via ssh)

##Performance

data copy speed mostly depends on your NIC(Ethernet) speed and other factors like how far you are from target and source servers (in terms of network topology and physically).

I've seen 10x performance improvement when I ran it on DEV Linux server (10Gb Ethernet) v.s. my office Windows Desktop (100Mb Ethernet).

##References

DataBuddy CLI for Oracle -- qc32\qc32.exe

##Vendor Docs

Database	Version	Export	Import	DbShell	File formats
Oracle 12c	`12c Release 1`	SQL*Plus	SQL*Loader		CSV, DDL
Oracle 11g	`11g Release 2`
Disco
Crate Data
MongoDB	`3.0`	mongoexport	mongoimport	mongo shell	CSV, JSON
TimesTen	`Release 2`
Amazon DynamoDB
Hadoop	`r2.7.0`
EXASOL	`5.0.3`				pdf
Kinesis
Netezza	`7.2.0`
KDB+
Aster
Aerospike
Couchbase
Redshift
Enterprise DB
Foundation DB
Arango DB
Volt DB
OrientDB
Clustrix
NuiDB
McObject
Actian Ingres
Greenplum
Salesforce		DataLoader
Twitter Streaming API
Google Analytics
Vertica	`7-1-x`
SQLServer
MySQL
PostgreSQL
SQLite
Informix	`12.1.0`
SAP ASE Sybase
DB2	`10.5.0`
dBase
MS Access	`10.0`
Pandas
Apache Spark
Apache Storm
MariaDB
Infobright
[Cassandra] (http://cassandra.apache.org/)
[HBase] (http://hbase.apache.org/)
[Teradata] (http://www.info.teradata.com/HTMLPubs/DB_TTU_14_00/index.html#page/Storage_Management/B035_2492_071A/2492ch01.084.17.html)
[Bigtable] (https://cloud.google.com/bigtable/docs/)
[Apache Hive] (http://doc.mapr.com/display/MapR/Hive)
InterSystems Caché			export
Altibase HDB
ZooKeeper
HDFS
Amazon SimpleDB
Amazon Redshift
Azure DocumentDB
Elasticsearch
Riak
MemSQL
MarkLogic
SQream DB
[1010data] (https://www.1010data.com/technology)
Kognitio
SAND Technology
Redis
ZODB
Amazon Simple Storage Service/S3
Amazon Aurora
RethinkDB
[Apache Derby] (https://db.apache.org/derby/)
InfiniDB
SQL Data Warehouse

Vendor architecture

[MongoDB Architecture Guide] (http://s3.amazonaws.com/info-mongodb-com/MongoDB_Architecture_Guide.pdf)

Data Tools	Version
[cURL] (http://curl.haxx.se/download.html)	7.33

Screenshots

New Session Menu:

Add New Session:

Run Session:

Executed Session:

Output Tab:

##Copy vector matrix (via CSV)

Database	DB2	Informix	MariaDB	MySQL	Oracle	PostgreSQL	SQLite	SQLServer	Sybase	TimesTen	MongoDB
DB2	x	x	x	x	x	x	x	x	x	x	x
Informix	x	x	x	x	x	x	x	x	x	x	x
MariaDB	x	x	x	x	x	x	x	x	x	x	x
MySQL	x	x	x	x	x	x	x	x	x	x	x
Oracle	x	x	x	x	x	x	x	x	x	x	x
PostgreSQL	x	x	x	x	x	x	x	x	x	x	x
SQLite	x	x	x	x	x	x	x	x	x	x	x
SQLServer	x	x	x	x	x	x	x	x	x	x	x
Sybase	x	x	x	x	x	x	x	x	x	x	x
TimesTen	x	x	x	x	x	x	x	x	x	x	x
Mongo DB	x	x	x	x	X	x	x	x	x	x	x

'x' means feature is implemented and is part of the release 0.3.3

##Copy vector matrix (via JSON)

Database	MongoDB
DB2	x
Informix	x
MariaDB	x
MySQL	x
Oracle	x
PostgreSQL	x
SQLite	x
SQLServer	x
Sybase	x
TimesTen	x
Mongo DB	x

##Databuddy Docs

Extracting_Table_Data_From_SQLServer_to_CSV_File

MongoDB
Copy_Table_Data_From_Oracle_to_MongoDB_Collection
Oracle
Copy_Table_Data_From_Oracle_to_MongoDB_Collection
CURL
Copy_data_from_Web_to_Oracle11g

##Download

git clone https://github.com/data-buddy/DataBuddy/
Data Buddy latest release -- data-buddy 0.3.5

#FAQ

Can it load CSV file from Windows desktop to Oracle database.

Yes, it is the main purpose of this tool.

Can developers integrate CSV loader into their ETL pipelines?

Yes, assuming they are doing it on OS Windows.

How fast is data upload using `Databuddy`?

It is as fast as SQLLoader because I'm using SQLLoader in DIRECT mode.

####How to inscrease upload speed? Parallelize your load by loading multiple partitions in parallel or sharding your data.

What are the other ways to upload file to Oracle?

You can use SQLLoader or execute insert statements using ODBC/JDBC or SQLPlus.

Can I just zip it using Windows File Explorer?

No, Redshift will not recognize *.zip file format. You have to gzip it. You can use 7-Zip to do that.

Does it delete source file load?

Does it create target Oracle table?

I'm experiencing errors in Oracle. How can I debug?

You can set Common Args debug_level to a value>1

Explain how it works?

The CSV file you provided is analyzed - number of records in it is estimated.
Control file is created using target table column names.
Data is loaded using SQL*Loader

How do I skip the header record in input CSV file?

Use --skip_rows 1 to ignore input rows.

How do i set custom timestamp format for Oracle load?

Use --nls_date_format "MM/DD/YYYY HH12:MI:SS" or --nls_timestamp_format "MM/DD/YYYY HH12:MI:SS.FF2 PM" or --nls_timestamp_tz_format "MM/DD/YYYY HH12:MI:SS.FF2 TZH:TZM" to control timestamp format.

What technology was used to create this tool

I used SQL*Loader and wxPython to write it.

Where are the sources?

Please, contact me for sources.

Can you modify functionality and add features?

Yes, please, ask me for new features.

What other AWS tools you've created?

[Oracle_To_S3_Data_Uploader] (https://github.com/alexbuz/Oracle_To_S3_Data_Uploader) - Stream Oracle data to Amazon- S3.
[S3_Sanity_Check] (https://github.com/alexbuz/S3_Sanity_Check/blob/master/README.md) - let's you ping Amazon-S3 bucket to see if it's publicly readable.
EC2_Metrics_Plotter - plots any CloudWatch EC2 instance metric stats.
S3_File_Uploader - uploads file from Windows to S3.

Do you have any AWS Certifications?

Yes, AWS Certified Developer (Associate)

Can you create similar/custom data tool for our business?

Yes, you can PM me here or email at alex_buz@yahoo.com. I'll get back to you within hours.

###Links

Employment FAQ

rafael-0317/Databuddy

DataBuddy

Pipeline types

What it doesn't do

Vendor architecture

Screenshots

Can it load CSV file from Windows desktop to Oracle database.

Can developers integrate CSV loader into their ETL pipelines?

How fast is data upload using Databuddy?

What are the other ways to upload file to Oracle?

Can I just zip it using Windows File Explorer?

Does it delete source file load?

Does it create target Oracle table?

I'm experiencing errors in Oracle. How can I debug?

Explain how it works?

How do I skip the header record in input CSV file?

How do i set custom timestamp format for Oracle load?

What technology was used to create this tool

Where are the sources?

Can you modify functionality and add features?

What other AWS tools you've created?

Do you have any AWS Certifications?

Can you create similar/custom data tool for our business?

How fast is data upload using `Databuddy`?