NER-Detection-and-Anonymization
NER Detection
There are various methond to detect the NER. Core NLP,NLTK,Spacy,RASA NLU.
There are some models available, which we can train on our data. Syntaxnet, CRF, LSTM,BERT.
I have used spacy here as it gives good result, easy to use and fast as well but it cannot distinguish b/w named entity which start with samll alphabet and for that i have used caseless model of stanfordNERTagger.
I have also used core NLP for date and mobile number detection.
Data Anonymization
As per my knowledge there are two option available to anonymizie the data.
1 Core python programming 2 Faker library
I have used faker library to encyrpt or say to fake the data.
Work with word document
There are also two option by which we can word on word document.
1 Docx 2 win32com.client
I have used win32com client as it is easy to use and can read or write the whole document as per source format.