Paraphrase detection is the task of examining two text entities (ex. sentence) and determining whether they have the same meaning. In order to obtain high accuracy on this task, thorough syntactic and semantic analysis of the two text entities is required.
In simple words, paraphrase is just an alternative representation of the same meaning.
According to granularity, paraphrases are of two types.
- Surface Paraphrases
- Lexical level
- Example - solve and resolve
- Phrase level
- Example - look after and take care of
- Sentence level
- Example - The table was set up in the carriage shed and The table was laid under the cart-shed
- Discourse level
- Lexical level
- Structural paraphrases
- Pattern level
- Example - [X] considers [Y] and [X] takes [Y] into consideration
- Collocation level
- Example - (turn on, OBJ ligth) and (switch on, OBJ light)
According to paraphrase style, they can be classified into five types.
- Trivial Change
- Example - all the members of and all members of
- Phrase replacement
- Example - There will be major cuts in the salaries of high-level civil servants and There will be major cuts in the salaries of senior officials
- Phrase reordering
- Example - Last night, I saw TOM in the shopping mall and I saw Tom in the shopping mall last night
- Sentence split & merge
- Example - He baught a computer which is very expensive and (1) He bought a computer. (2) The computer is very expensive.
- Complex paraphrase
- Example - He said there will be major cuts in the salaries of high-level civil servants and He claimed to implement huge salary cut to senior civil servants
- Machine Translation
- Simplify input sentences
- Alleviate data sparseness
- Question Answering
- Question reformulation
- Information Extraction
- IE pattern expansion
- Information Retrieval
- Query reformulation
- Summarization
- Sentence clustering
- Automatic evaluation
- Natural Language Generation
- Sentence rewriting
- Others
- Changing writing style
- Text simplification
- Identifying plagiarism
- Textual Entailment
- Semantic Textual Similarity
- Paraphrase identification
- Paraphrase extraction
- Paraphrase generation
- Paraphrase applications
- Specially refers to sentential paraphrase identification
- Given any pair of sentences, automatically identifies whether these two sentences are paraphrases
- Classification based methods
- Reviewed as a binary classification problem
- Compute the similarities between two sentences at different levels which are then used as classification features
- Previous works: Brockett and Dolan, 2005, Finch et al., 2005, Malakasiotis, 2009
- Alignment based methods
- Align the two sentences and score the pair based on the alignment results
- Previous works: Wu, 2005, Das and Smith, 2009
More discussion on the previous works are documented here.