├── README.md
├── run.sh
├── src
│ └──h1b_counting.py
│ └──ParseInput.py
│ └──Top10list.py
├── input
│ └──h1b_input.csv
├── output
| └── top_10_occupations.txt
| └── top_10_states.txt
├── insight_testsuite
└── run_tests.sh
└── tests
└── test_1
├── input
│ └── h1b_input.csv
|__ output
└── top_10_occupations.txt
└── top_10_states.txt
- Read the raw content of the input csv and stored as string variable. Split the string variable as row by row by new line char ('\n').
- Since each line has to be parsed and extracted specific set of fields, instead of for loop, I used map function which does the feature extraction faster than plain for loop
- Once features are extracted appropriate top 10 list are calculated by sorting the features
- Wrote the top 10 list to appropriate output files.
python ./src/h1b_counting.py ./input/h1b_input.csv ./output/top_10_occupations.txt ./output/top_10_states.txt
We can trigger the python script by running ./run.sh shell script which passes the necessary main python script location(./src/h1b_counting.py), input(./input/h1b_input.csv) and output(./output/*) folders paths as arugments.
sys: To parse the command line system arguments