Quora Question Pairs Competition
Apply random forest on the dataset directly, Let's see what is the accuracy.
In the feature engineering: we will add this features in the data_frame
question1_lenght
- char lenght of question1question2_lenght
- char lenght of question2q1_word_count
- no. of words in the question1q2_word_count
- no. of words in the question2word_commmon
- common word in q1 & q2word_total
- Total word in q1 + q2word_share
- word_common / word-total