Simple python package to merge CSV files
This Python package enables merging of 2 CSV files. Currently, only inner joins are supported, with plans for more join operations.
This package will work fairly well on large datasets as it doesn't require reading both CSV files in memory. However, at least 1 CSV file must be small enough to be read in memory. The inner join is fairly efficient as it employs a HashMap to efficient find rows to be joined.
To get a local copy up and running follow these simple steps:
pip3 install git+https://github.com/rquitales/csv-joiner-python.git
- Import
csvjoiner
within your Python script
This package requires Python 3.5 or greater in order to function properly. No other dependencies are required as it is implemented fully using base Python.
-
Exporting merged data to CSV file:
import csvjoiner # Join on columns: colA, colB, colC join = csvjoiner.Joiner("/path/to/first/csv", "/path/to/second/csv", "colA", "colB", "colC") join.inner("/path/to/output/csv")
-
Converting merged data into Pandas dataframe:
import csvjoiner import pandas as pd # Join on columns: colA, colB, colC join = csvjoiner.Joiner("/path/to/first/csv", "/path/to/second/csv", "colA", "colB", "colC") merged_data = join.inner() df = pd.DataFrame(data=merged_data['data'], columns=merged_data['headers'])
Contributions are what make the open source community such an amazing place to be learn, inspire, and create. Any contributions you make are greatly appreciated.
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/AmazingFeature
) - Commit your Changes (
git commit -m 'Add some AmazingFeature'
) - Push to the Branch (
git push origin feature/AmazingFeature
) - Open a Pull Request
Distributed under the MIT License. See LICENSE
for more information.
Ramon Quitales - oss@rquitales.com
Project Link: https://github.com/rquitales/csv-joiner-python