Customer reviews can often be long and descriptive. Analyzing these reviews manually, as you can imagine, is really time-consuming. This is where the brilliance of Natural Language Processing can be applied to generate a summary for long reviews.
Let’s first understand what text summarization is before we look at how it works. Here is a succinct definition to get us started:
“Automatic text summarization is the task of producing a concise and fluent summary while preserving key information content and overall meaning”
-Text Summarization Techniques: A Brief Survey, 2017
There are broadly two different approaches that are used for text summarization:
- Extractive Summarization
- Abstractive Summarization
The name gives away what this approach does. We identify the important sentences or phrases from the original text and extract only those from the text.
This is a very interesting approach. Here, we generate new sentences from the original text. This is in contrast to the extractive approach we saw earlier where we used only the sentences that were present.
data['Text'][:10]
data['Summary'][:10]
for i in range(5): print("Review:",data['cleaned_text'][i]) print("Summary:",data['cleaned_summary'][i]) print("\n")
length_df = pd.DataFrame({'text':text_word_count, 'summary':summary_word_count}) length_df.hist(bins = 30) plt.show()
model.summary()
history=model.fit([x_tr,y_tr[:,:-1]], y_tr.reshape(y_tr.shape[0],y_tr.shape[1], 1)[:,1:],epochs=2,callbacks=[es],batch_size=1000, validation_data=([x_val,y_val[:,:-1]], y_val.reshape(y_val.shape[0],y_val.shape[1], 1)[:,1:]))
pyplot.plot(history.history['loss'], label='train') pyplot.plot(history.history['val_loss'], label='test') pyplot.legend() pyplot.show()
for i in range(30,100): print("Original Human-made Review:",seq2text(x_tr[i])) print("-------------Summary Below-------------") print("Predicted summary:",seq2summary(y_tr[i])) print("\n\n")
MIT Licensed.