/csv-joiner-python

A python package to combine CSV files

Primary LanguagePythonMIT LicenseMIT

Contributors Forks Stargazers Issues MIT License


CSV Joiner

Simple python package to merge CSV files

Table of Contents

About The Project

This Python package enables merging of 2 CSV files. Currently, only inner joins are supported, with plans for more join operations.

This package will work fairly well on large datasets as it doesn't require reading both CSV files in memory. However, at least 1 CSV file must be small enough to be read in memory. The inner join is fairly efficient as it employs a HashMap to efficient find rows to be joined.

Built With

Getting Started

To get a local copy up and running follow these simple steps:

  1. pip3 install git+https://github.com/rquitales/csv-joiner-python.git
  2. Import csvjoiner within your Python script

Prerequisites

This package requires Python 3.5 or greater in order to function properly. No other dependencies are required as it is implemented fully using base Python.

Usage

  • Exporting merged data to CSV file:

    import csvjoiner
    
    # Join on columns: colA, colB, colC
    join = csvjoiner.Joiner("/path/to/first/csv", "/path/to/second/csv", "colA", "colB", "colC")
    
    join.inner("/path/to/output/csv")
    
  • Converting merged data into Pandas dataframe:

    import csvjoiner
    import pandas as pd
    
    # Join on columns: colA, colB, colC
    join = csvjoiner.Joiner("/path/to/first/csv", "/path/to/second/csv", "colA", "colB", "colC")
    
    merged_data = join.inner()
    df = pd.DataFrame(data=merged_data['data'], columns=merged_data['headers'])
    

Contributing

Contributions are what make the open source community such an amazing place to be learn, inspire, and create. Any contributions you make are greatly appreciated.

  1. Fork the Project
  2. Create your Feature Branch (git checkout -b feature/AmazingFeature)
  3. Commit your Changes (git commit -m 'Add some AmazingFeature')
  4. Push to the Branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

License

Distributed under the MIT License. See LICENSE for more information.

Contact

Ramon Quitales - oss@rquitales.com

Project Link: https://github.com/rquitales/csv-joiner-python