Data parser to parse the fixed width file. The codebase includes three major parts:
dataparser.py
defines the main parser classgenerator.py
defines the methods that are used to generate the example fixed width filestestcases.py
includes all the test cases. Test data are included in the directorytests
* Python 3.6
* Git
# Download the repo
git clone https://github.com/chuanwuliu/data-parser.git
- The main data parser function is defined in
dataparser.py
. To convert a fixed widthinput_file
and save the result tooutput_file
:For examplepython dataparser.py input_file output_file
python dataparser.py tests/test_input1.txt tests/_temp_output2.csv
- The default delimiter is comma. You can customised the delimiter using the
-d
argument. For example, parsing with@
python dataparser.py tests/test_input1.txt tests/_temp_output2.csv --d @
- More details about the usage
usage: dataparser.py [-h] [-d DELIMITER] [-s SPEC_FILE] input_file output_file
positional arguments:
input_file Path to the input (fixed width) file
output_file Path to save the output
optional arguments:
-h, --help show this help message and exit
-d DELIMITER Delimiter for parsing the file
-s SPEC_FILE Path to specification (json) file
Run the test cases
python testcases.py
Following cases have been tested:
- Parse input file with fully filled up fields
- Parse input file with left aligned fields and blank fields
- Parse input file with right aligned fields and blank fields
- Parse file with all blank fields
In each case, the sample input is parsed and its sample output is compared with a manually parsed output.
Currently, fields in the fixed width file only include letters, digits and pure whitespace character.
Fields with more complicated whitespaces such as \t
and \r
have not been considered and tested.
A helper function has been built for generating some example files
python generator.py
Charles Liu: dr.liuchuanwu@gmail.com