Parsemesan
This grate repository will unbrielievably parse your CSV string and return a Python dictionary with headers and rows of data. You cheddar believe I put some gouda tests in here.
All cheesiness aside, the goal of this project is to allow the parsing of CSV files to stand alone as a separate service and remove the burden from the frontend code of Datasmith and later Wordsmith. This project will most likely expand to include other data formats.
Getting Started
Installation
Python 3.6+ is required. From repo root, run pip install -r requirements.txt
Tests
From repo root, run pytest
From repo root, run pylint parsemesan tests
Coverage
From repo root, run coverage run -m pytest && coverage html
Documentation
API
parse_data
Params
- {str}:
source_type
the type of data source (arrays
,csv
,unicode
) - {*}:
input_data
CSV{bytes}
or Python{list}
or Python{str}
(which is Unicode)
Return
- {dict}: Python object with following spec:
{ "errors": [ <dict> ] "result": { "headers": <list of {str}> "rows": <list of lists of {str}> } }
get_valid_formats
Params
- No inputs required.
Return
- {dict}: Python object with following spec:
{ "formats": <list of {str}> }
Types and Errors
byte_string
: a raw Pythonbytes
array. Does not apply tosource_type='arrays'
.EncodingError
raised if not valid.stream
: a Python object that reads the data sequentially.FileTypeError
raised if not valid.data
: a Python dictionary with keysheaders
androws
(see below).DataError
raised if not valid.
Pipelines
Each pipeline combines a parser
and at least one validator
, as shown in the pipelines/
directory.
Arrays
The input_data
is a Python {list}
of {lists}
containing the rows of data.
- Parse or reorganize the object into
data
. - Validate the
data
's rows and headers as proper tabular.
data
is returned as described above.
CSV
The input_data
is a Python {bytes}
object containing an encoded CSV table.
- Detect
input_data
encoding using thechardet
module. - Validate the
byte_string
's encoding by calling the native.decode(encoding)
function. - Convert it to a
stream
with the nativeioString
module. - Validate the
stream
's file type by eliminating other common possibilities of files (.html
,.xml
). - Parse the stream into
data
using the nativecsv
module. - Validate the
data
's rows and headers as proper tabular.
data
is returned as described above.
Unicode
The input_data
is a Python {str}
object (which is Unicode in Python 3) containing an encoded CSV table.
- Convert the string to a
stream
with the nativeioString
module. - Validate the
stream
's file type by eliminating other common possibilities of files (.html
,.xml
). - Parse the stream into
data
using the nativecsv
module. - Validate the
data
's rows and headers as proper tabular.
data
is returned as described above.