Python code to normalize a CSV file.
This is a tool that reads a CSV formatted file on stdin
and emits a normalized CSV formatted file
on stdout
.
- Input file should be in UTF-8
- Times are in US/Pacific.
- The sample data contains all date and time format variants possible.
Output file should be a CSV file that has been normalized.
Normalized, in this case, means:
- The entire CSV is in the UTF-8 character set.
- If a character is invalid, it will be replaced with the Unicode Replacement Character.
- If that replacement makes data invalid (for example, because it turns a date field into
something unparseable), a warning will be printed to
stderr
and the row will be absent from the output.
- If that replacement makes data invalid (for example, because it turns a date field into
something unparseable), a warning will be printed to
- If a character is invalid, it will be replaced with the Unicode Replacement Character.
Timestamp
- Should be formatted in RFC3339 format.
- Should be converted from US/Pacific time to US/Eastern.
Address
- Should be passed through as is, except for Unicode validation.
ZIP
- Should be 5 digits.
- Prepend with 0 if less than 5 digits.
FullName
- Should be converted to uppercase.
FooDuration
andBarDuration
- Will be seconds, in floating point.
TotalDuration
- Sum of
FooDuration
andBarDuration
.
- Sum of
Notes
- Should be passed through as is, except for Unicode validation.
This project is wired up to work with Docker
. You aren't required to use it, but it can make
things easier.
The instructions in the next sections will include instructions for both Docker
and a regular
python interpreter.
This section covers some of the high-level notes for some of the files included in this repo:
.coveragerc
- Contains settings for the test coverage plugin..dockerignore
- Contains patterns for files/directories to skip when copying files into the docker build context.- Note that this isn't used when mapping volumes into a container.
- Note that the patterns in this file are mostly also covered in
.gitignore
so if you add patterns to this file, consider whether or not they should also go in the other one too.
.editorconfig
- Contains some settings for how files are treated to try to keep things consistent across people's different editor settings and IDEs..gitattributes
- Settings for howgit
should treat different files..gitignore
- Patterns for files/directories to avoid committing to the repository.- Note that the patterns in this file are mostly also covered in
.dockerignore
so if you add patterns to this file, consider whether or not they should also go in the other one too.
- Note that the patterns in this file are mostly also covered in
docker-compose.yaml
- Defines "services" to run our project, e.g.app
to run local development and testing.Dockerfile
- Defines the execution environment for our project, both locally and deployed.Makefile
- Defines "targets" which are shortcuts of sorts, so instead of running something like:docker-compose build --pull app
thendocker-compose up app
, you can just runmake build-local
- To see what shortcuts are available along with some help text, you can run
make help
on the command line. - Note that these are set up to work with Docker only.
- To see what shortcuts are available along with some help text, you can run
pyproject.toml
- Defines settings for python tools, such asblack
andpytest
.requirements.txt
- Defines what packages and versions your project needs to run when deployed.test_requirements.txt
- Defines packages and versions your project needs to run locally, meaning things like linting and testing packages.
- Docker:
make build-dev
- Python:
python3 -m venv venv
. venv/bin/activate
pip3 install -r test_requirements.txt
- Docker:
make normalize < my_csv.csv > output.csv
- Python:
PYTHONPATH=. python3 src/csv_normalizer.py < my_csv.csv > output.csv
This codebase is set up to lint the code using python black. To lint do the following:
- Docker:
make lint
- Python:
black .
- If you want to see what it would change without actually changing the code, you can run
it with the
--check
flag.
- If you want to see what it would change without actually changing the code, you can run
it with the
- Docker:
- Run
make test
- If you want to re-build the image (if you change
Dockerfile
), then you can runmake build-test
- If you want to pass some options to
pytest
, you can run the command like this:make test OPT="-s"
- If you want to re-build the image (if you change
- Run
- Python:
- Run:
pytest
- You can pass extra options to pytest like
pytest -s
to not run browser tests locally.
- You can pass extra options to pytest like
- Run:
If you want to be able to run things inside the container more than just running a single command, or to move around and see how things look, you can run this command:
docker-compose run --rm app bash
or use the Makefile shortcutmake bash
--rm
makes it so that the container gets deleted when you exit it. This can be useful to avoid cluttering up your host machine. If you care to keep the container, then take that part off.app
is just the name of the service defined indocker-compose.yaml
bash
starts the command line session as a bash session.