Jxtopher/retails-analysis

Statistic of retails

Python

Retails analysis (2022-04-11)

Project deployment

The project is composed of 6 services:

retails-analysis-api: http://localhost
jupyter-notebook: http://localhost:8888
spark-worker
spark-master http://localhost:8080
mongo
mongo-express: http://localhost:8081

You can used the docker-compose for deployment the project as folllow:

First copy and edit the environment variables.

cp .env-example .env

Deployment the project

docker-compose build .
docker-compose up

Please create database in mongodb, corresponding to 'MONNGO_DB' in .env file
Import the dataset 'retails data', if it hasn't already been done. Only inside the container for this moment.

docker exec -it retails-analysis-api bash
poetry run import '/apps/dataset/Online Retail.xlsx'

Connect à http://localhost and use retails analysis api route

Features

Statistics available through an API route
Import dataset only files on format xlsx
Continuous integration with GitHub Actions
Creation of the work environment: docker-compose and Dockerfile
Improve how to import files, use an API route
Set up Jupyter Notebook
Better distinguish environment Prod and Dev
More unit tests

Retails analysis API

The API documentation is accessible at http://localhost/docs

Checks the project

poetry run mypy --config-file .config/mypy.cfg retails_analysis
poetry run black retails_analysis --config .config/black.cfg
poetry run flake8 --config .config/flake8.cfg
poetry run python -u -m unittest discover

References

MongoDB Connector for Spark: https://www.mongodb.com/docs/spark-connector/current/
Jupyter tutorial: https://www.sicara.ai/blog/2017-05-02-get-started-pyspark-jupyter-notebook-3-minutes