This repo contains the data science NLP challenge. We use it for evaluating DS candidates at IBM CIO Brazil..
You are able to choose one of those datasets:
- https://www.kaggle.com/leandrodoze/sentiment-analysis-in-portuguese/data
- https://www.kaggle.com/c/jigsaw-toxic-comment-classification-challenge/data
- https://www.kaggle.com/datatattle/covid-19-nlp-text-classification
To evaluate better the skills and knowledge, we consider the following steps:
1- Understand and clean the dataset. After that, create and evaluate the model
2- Expose the model using a simple API (you could use Flask, FastAPI, etc)
- This app will receive a request (json format) and then it will respond a json with model inference.
3- Create a basic UI, this will be consumible for non technical team.
Pls, consider to follow the CRISP-DM methodology. If you have any questions, do not hesitate to contact me.