This repository contains a sample Python project that uses pandas
to parse a CSV in a specific format.
Check the header line of posts.csv
for the required columns.
From the input CSV file, the following outputs are generated:
top_posts.[csv|json]
The posts that are public, have over 10 comments and over 9000 views, and have titles shorter than 40 characters.other_posts.[csv|json]
The posts that do not meet the criteria oftop_posts.[csv|json]
.daily_top_posts.[csv|json]
A subset oftop_posts.[csv|json]
comprises the top post of the day based on the number of likes.
- The driver Python script
__main__.py
. - The source Python scripts in
src
directory. - A few unit test cases in
test
directory. - A
requirements.txt
file containing a list of required packages. - An
posts.csv
file containing the input CSV. - An
top_posts.csv
file containing the sample top posts output as a CSV file. - An
top_posts.json
file containing the sample top posts output as a JSON file. - An
other_posts.csv
file containing the sample other posts output as a CSV file. - An
other_posts.json
file containing the sample other posts output as a JSON file. - An
daily_top_posts.csv
file containing the sample daily top posts output as a CSV file. - An
daily_top_posts.json
file containing the sample daily top posts output as a JSON file.
Tornado
for parsing command line argumentpandas
for parsing CSV
- Install the packages in
requirements.txt
.pip3 install -r requirements.txt --user
- Run
__main__.py
script.Examples:python3 __main__.py
- Run with
--help
switch to see available command line options.python3 __main__.py --help
- To output full record as a JSON file with each record in its own line.
python3 __main__.py \ --output-file-format=json \ --full-record \ --json_record-per-line
- Run with
- To run the unit test cases,
python3 -m unittest