The cvs combiner was implemented in python. To run the file navigate to the files location and type in the following command
$ python3 cvs-combiner.py ./fixtures/accessories.csv ./fixtures/clothing.csv > combined.csv
The second and third arguments can be replaced with other CSV files and with any number of arguments (To acomodate multiple csv files). There are two different python files attached where I implemented using two different approaches. The first one will be more useful to lasrger file sizes. An example cvs file has been attached to show the output after running the code from the py files.
Write a command line program that takes several CSV files as arguments. Each CSV
file (found in the fixtures
directory of this repo) will have the same
columns. Your script should output a new CSV file to stdout
that contains the
rows from each of the inputs along with an additional column that has the
filename from which the row came (only the file's basename, not the entire path).
Use filename
as the header for the additional column.
We will run your code as follows
$ ./csv-combiner.php ./fixtures/accessories.csv ./fixtures/clothing.csv > combined.csv
However, the CSV files inside the fixtures are not the only files we will run through. We will run your code through files > 2 GB to see if you hit memory limits.
Given two input files named clothing.csv
and accessories.csv
.
email_hash | category |
---|---|
21d56b6a011f91f4163fcb13d416aa4e1a2c7d82115b3fd3d831241fd63 | Shirts |
21d56b6a011f91f4163fcb13d416aa4e1a2c7d82115b3fd3d831241fd63 | Pants |
166ca9b3a59edaf774d107533fba2c70ed309516376ce2693e92c777dd971c4b | Cardigans |
email_hash | category |
---|---|
176146e4ae48e70df2e628b45dccfd53405c73f951c003fb8c9c09b3207e7aab | Wallets |
63d42170fa2d706101ab713de2313ad3f9a05aa0b1c875a56545cfd69f7101fe | Purses |
Your script would output
email_hash | category | filename |
---|---|---|
21d56b6a011f91f4163fcb13d416aa4e1a2c7d82115b3fd3d831241fd63 | Shirts | clothing.csv |
21d56b6a011f91f4163fcb13d416aa4e1a2c7d82115b3fd3d831241fd63 | Pants | clothing.csv |
166ca9b3a59edaf774d107533fba2c70ed309516376ce2693e92c777dd971c4b | Cardigans | clothing.csv |
176146e4ae48e70df2e628b45dccfd53405c73f951c003fb8c9c09b3207e7aab | Wallets | accessories.csv |
63d42170fa2d706101ab713de2313ad3f9a05aa0b1c875a56545cfd69f7101fe | Purses | accessories.csv |
- You should use coding best practices. Your code should be re-usable and extensible.
- Your code should be testable by a CI/CD process. Unit tests are important.