This repo conatins steps on how to create your Transformer model for Punjabi Language which is a Indic language with more than 25 million speakers.
-
Data Collection - For the purpose of data collection i have collected data from websites of punjabi newspapers by scrapping content from them.
-
Data Cleaning - For the Data Cleaning I have used appropriate functions to clean text we have scrapped from websites.
-
Data Handling and Data crunching - For data handling we have used pytorch classes to handle data.
- We will be creating a neural network with the RobertaClass.
- This network will have the Roberta Language model followed by a dropout and finally a Linear layer to obtain the final outputs.
- The data will be fed to the Roberta Language model as defined in the dataset.
- Final layer outputs is what will be compared to the Sentiment category to determine the accuracy of models prediction.
- We will initiate an instance of the network called model. This instance will be used for training and then to save the final trained model for future inference.
Disclaimer - For complete acces to notebook alongside click here