Comment Verification using digikala dataset
In this project, I manage to solve a common supervised learning problem (comment verification) using python.
Generally, I follow a pipeline for each one of my projects and this NLP project is not an exception. The pipeline that I followed can be accessed in the images folder.
- Firstly, I am going to try several data cleaning techniques using regular expressions and built-in python methods.
- Secondly, I will build my initial document-term matrix to feed it to my machine learning models.
- During the project, I came across the class-imbalance problem which is common in machine learning problems, so I am going to apply several methods to overcome this problem.
- I implemented Multinigual-BERT because I did not find any Persian-BERT model that can be useful.
- Natural Language Processing (NLP)
- Data Cleaning
- Exploratory Data Analysis
- Machine Learning
- Deep Learning
- Text Classification
MIT License
Copyright (c) [2020] [Shakib Yazdani]
- Twitter - @iamshakibyz
- LinkedIn - shakib-yazdani