Python Run Guide

The cvs combiner was implemented in python. To run the file navigate to the files location and type in the following command

$  python3 cvs-combiner.py ./fixtures/accessories.csv ./fixtures/clothing.csv > combined.csv

The second and third arguments can be replaced with other CSV files and with any number of arguments (To acomodate multiple csv files). There are two different python files attached where I implemented using two different approaches. The first one will be more useful to lasrger file sizes. An example cvs file has been attached to show the output after running the code from the py files.

CSV Combiner

Write a command line program that takes several CSV files as arguments. Each CSV file (found in the fixtures directory of this repo) will have the same columns. Your script should output a new CSV file to stdout that contains the rows from each of the inputs along with an additional column that has the filename from which the row came (only the file's basename, not the entire path). Use filename as the header for the additional column.

Input & Output

We will run your code as follows

$ ./csv-combiner.php ./fixtures/accessories.csv ./fixtures/clothing.csv > combined.csv

However, the CSV files inside the fixtures are not the only files we will run through. We will run your code through files > 2 GB to see if you hit memory limits.

Example

Given two input files named clothing.csv and accessories.csv.

email_hash category
21d56b6a011f91f4163fcb13d416aa4e1a2c7d82115b3fd3d831241fd63 Shirts
21d56b6a011f91f4163fcb13d416aa4e1a2c7d82115b3fd3d831241fd63 Pants
166ca9b3a59edaf774d107533fba2c70ed309516376ce2693e92c777dd971c4b Cardigans
email_hash category
176146e4ae48e70df2e628b45dccfd53405c73f951c003fb8c9c09b3207e7aab Wallets
63d42170fa2d706101ab713de2313ad3f9a05aa0b1c875a56545cfd69f7101fe Purses

Your script would output

email_hash category filename
21d56b6a011f91f4163fcb13d416aa4e1a2c7d82115b3fd3d831241fd63 Shirts clothing.csv
21d56b6a011f91f4163fcb13d416aa4e1a2c7d82115b3fd3d831241fd63 Pants clothing.csv
166ca9b3a59edaf774d107533fba2c70ed309516376ce2693e92c777dd971c4b Cardigans clothing.csv
176146e4ae48e70df2e628b45dccfd53405c73f951c003fb8c9c09b3207e7aab Wallets accessories.csv
63d42170fa2d706101ab713de2313ad3f9a05aa0b1c875a56545cfd69f7101fe Purses accessories.csv

Considerations

  • You should use coding best practices. Your code should be re-usable and extensible.
  • Your code should be testable by a CI/CD process. Unit tests are important.