Poetry managed Python FastAPI application with Docker multi-stage builds.
This application uses docker-compose for development and Postgresql as a database.
Although you can start playing with the API immediately with the docker-compose command you should first import the data in a separate step described below.
NOTE - Run all commands from the project root
Run the following to quick build everything necessary to run the application in dev:
docker-compose -f docker/docker-compose.yml build geolocation-service
and the for starting the API:
docker-compose -f docker/docker-compose.yml up
To populate the database:
- Write your .csv dataset in the data/ folder in the root of the project
- Update (if necessary) app/config/config_dev.ini CSV_PATH variable
- Rebuild your image and then run:
docker exec geolocation-service python -m app.data_library.data_loader
Create the virtual environment and install dependencies with:
poetry install
See the poetry docs for information on how to add/update dependencies.
Run commands inside the virtual environment with:
poetry run <your_command>
Spawn a shell inside the virtual environment with
poetry shell
Start a development server locally
poetry run uvicorn app.main:app --reload --host localhost --port 8000
API will be available at localhost:8000/
Swagger docs at localhost:8000/docs
But you will need a PostgreSQL instance up and running before doing that:
docker run --name postgresql-container -p 5432:5432 -e POSTGRES_PASSWORD=postgres -d postgres
Build images with:
docker build --tag geolocation-service --file docker/Dockerfile .
The Dockerfile uses multi-stage builds to run lint and test stages before building the production stage. If linting or testing fails the build will fail.
You can stop the build at specific stages with the --target
option:
docker build --name geolocation-service --file docker/Dockerfile . --target <stage>
For example, we wanted to stop at the test stage:
docker build --tag geolocation-service --file docker/Dockerfile --target test .
We could then get a shell inside the container with:
docker run -it geolocation-service:latest bash
If you do not specify a target the resulting image will be the last image defined which in our case is the 'production' image.
For a database you can use the same docker container described in the end of the poetry instructions section for development.
And you can always stick to the docker-compose option described in the TL;DR session for a faster flow.
Check the other doc (util_commands.md) for other useful commands when operating the application
In respect of time there is a lot that I have compromised when writing the application. Nevertheless, I consider this version viable as a first product given my constraints.
Some limitations I'm already aware of that I would work next if I could:
- Write async version of the get "/geolocation/{ip_address}" ;
- For doing this we should also change the way the DB client is managed today, to allow the queries to Postgresql executed by SQLAlachemy to be async.
- There is no configs file for prod and staging for obvious reasons.
- The "data_library" is not really a lib that could be packaged and deployed separately right now as it has some dependencies on other namespaces of the project ;
- This especially because of the "data_loader" application which I would probably refactor to another place to be a consumer of the "data_library";
- Also, the pydantic model "schema_geolocation" might be moved to away of the lib to be closer to the API itself.
- Finally, regarding performance of the exporter, it loads the whole dataset extracted from the .csv into memory;
- This is of course does not scale well for too big datasets (.csv files potentially bigger than commodity hardware memory, lets say 8gigs) ;
- So I would introduce a configurable threshold on the buffer to bulk dump records as soon as this limit is achieved;
- Another issue with this is that we would need to change the data model to allow de-duplication of the records in some other form as it's really hard to de-duplicate records without having access to all of them in memory.
Finally, I would love to re-write this app in a language with better computational performance. Let's say for example, Go :)