Classic MLE template with CI/CD pipelines
Using technologies:
- Analytics and model training
- Python 3.x
- Pandas, NumPy, SkLearn
- Testing
- unittest + coverage
- Data / Model versioning
- DVC
- CI/CD
- GitHub Actions
- Docker Image: ml-pipe-twitter-sentiment
Twitter Sentiment Analysis Dataset from Kaggle. Sentiment analysis is a common task in the field of Natural Language Processing (NLP). It is used to determine whether a piece of text is positive, negative, or neutral. In this dataset, the task is to classify the sentiment of tweets from Twitter.
- Download dataset from Kaggle
- Analyze dataset and create simple baseline model in this notebook
- Transform notebook to python scripts in src folder
- Put dataset into S3 bucket using DVC
- Created Dockerfile and docker-compose.yml
- Created CI / CD pipelines using GitHub Actions:
- Saving logs with Greenplum database during functional testing
- Secrets vault with HashiCorp Vault
- Message broker with Kafka
Run data preprocessing tests:
python -m unittest src/unit_tests/test_preprocess.py
Run model training tests:
python -m unittest src/unit_tests/test_training.py
twitter-sentiment_1 | INFO:root:Fitting model
twitter-sentiment_1 | INFO:root:Train F1 0.8117694303924563 | Valid F1 0.7406303833044623
twitter-sentiment_1 | INFO:root:Predicting on test data
twitter-sentiment_1 | INFO:root:Saving test predictions
twitter-sentiment_1 | ......
twitter-sentiment_1 | ----------------------------------------------------------------------
twitter-sentiment_1 | Ran 6 tests in 0.679s
twitter-sentiment_1 |
twitter-sentiment_1 | OK
twitter-sentiment_1 | ....
twitter-sentiment_1 | ----------------------------------------------------------------------
twitter-sentiment_1 | Ran 4 tests in 21.795s
twitter-sentiment_1 |
twitter-sentiment_1 | OK
twitter-sentiment_1 | Name Stmts Miss Cover Missing
twitter-sentiment_1 | -----------------------------------------------------------------
twitter-sentiment_1 | src/constants.py 3 0 100%
twitter-sentiment_1 | src/preprocess.py 49 3 94% 23-25
twitter-sentiment_1 | src/train.py 75 23 69% 90-91, 95-96, 121-143, 147
twitter-sentiment_1 | src/unit_tests/test_preprocess.py 43 0 100%
twitter-sentiment_1 | src/unit_tests/test_training.py 26 0 100%
twitter-sentiment_1 | -----------------------------------------------------------------
twitter-sentiment_1 | TOTAL 196 26 87%
bigdata-course-01_twitter-sentiment_1 exited with code 0