Recurrence: Group similar incidents together and find similar incidents of any particular incident.
Model’s Parameters:
- TopRecSent: This is used to set the threshold value to show top recurred sentences, by default it is set to 100.
- SimilarityPer: This is used to set the threshold value for similarity percentage. By default is it (0.6) which means sentence having similarity more than 60% will be grouped together.
- DateColumn: This is required to prepare the final dataframe.
- IncidentDescription: This is the text column on which we are trying to find the recurrence pattern.
Model Preparation: Steps:
-
- Data Cleaning
-
- Sentence Embedding( TFIDF)
-
- Cosine Similarity b/w vectors generated by TFIDF (matrix size NxN)
-
- Creating new matrix having more than 60% similarity between elements.
-
- Dataframe creation having details as below:
-
- After 5 steps we are taking whole data having repeated count less than 10.
-
- Sentence Embedding creation using TDFIDF.
-
- Cosine Similarity b/w vectors.
-
- Creating new matrix having more than 60% similarity between elements.
-
- Creating data frame as per step 5.
-
- Taking data having repeated count less than 10 and adding rest in dataframe created in step 5.
-
- Repeating 6 to 10 steps in loop.
-
- Sentence Embedding( TFIDF) of above column (Sentence Text)
-
- Cosine Similarity b/w vectors generated by TFIDF (matrix size NxN)
-
- Creating new matrix having more than 60% similarity between elements.
-
- Final dataframe creation having details as below:
-
- Taking input from user( text data).
-
- Creating embedding of input sentence.
-
- Cosine similarity b/w input sentence and sentence text of final data frame( step 9).
-
- Showing top 3 similar sentence as below format: