/NLP

Primary LanguageJupyter Notebook

NLP

NLP or Natural Language Processing is a branch of AI that mainly deals with Human Language, in the form of text or speech and to convert them from Human Readable format to Machine Understandable format. The machines understand only numbers. However,before the texts can be converted to numerical format a lot of pre-processing needs to be done on the texts to be able to present them to the machines in a meaningful way.

1.NLP_Into.ipynb: A introduction to NLP. Shows some of the basic things that need to be done to handle textual data before they can be converted to numbers and then used for training the machine.

2.𝐕𝐨𝐜𝐚𝐛𝐎𝐟𝐖𝐨𝐫𝐝𝐬.𝐢𝐩𝐲𝐧𝐛: When dealing with texts as features, the texts need to be pre-processed. One such method is the Bag Of Words Method or BOW. In this the main important words are added to a dictionary and then each sentence is converted to a vector depending on the presence of a particular word in the dictionary. If the word is present then it is 1 else 0. This comes under the feature extraction part. The following code does so. This one creates a dictionary of words.

3.Text2Num.ipynb: To convert the words into numerical format so that they can be understood by the machines.

4.NLP_Emoji.ipynb: To remove the emoji and the emoticons from text data.