Problem Statement

Recurrence: Group similar incidents together and find similar incidents of any particular incident.

Model’s Parameters:

TopRecSent: This is used to set the threshold value to show top recurred sentences, by default it is set to 100.
SimilarityPer: This is used to set the threshold value for similarity percentage. By default is it (0.6) which means sentence having similarity more than 60% will be grouped together.
DateColumn: This is required to prepare the final dataframe.
IncidentDescription: This is the text column on which we are trying to find the recurrence pattern.

Model Preparation: Steps:

Sentence Text, Repeated Sentences Index, Repeated Count

1. After 5 steps we are taking whole data having repeated count less than 10.
1. Sentence Embedding creation using TDFIDF.
1. Cosine Similarity b/w vectors.
1. Creating new matrix having more than 60% similarity between elements.
1. Creating data frame as per step 5.
1. Taking data having repeated count less than 10 and adding rest in dataframe created in step 5.
1. Repeating 6 to 10 steps in loop.

1. Taking input from user( text data).
1. Creating embedding of input sentence.
1. Cosine similarity b/w input sentence and sentence text of final data frame( step 9).
1. Showing top 3 similar sentence as below format: