/comparing-text-similarity

Projeto Python que compara a similaridade de texto usando Sentence Transformers e FuzzyWuzzy, explorando diferentes métodos para medir a semelhança entre strings

Primary LanguagePython

Text Similarity Comparison Project

Overview

This Python project demonstrates the comparison of text similarity using two different methods: Sentence Transformers for semantic similarity and Fuzzy for character-based similarity. Text similarity measurement plays a significant role in various applications, and understanding the differences between these methods is crucial for making informed decisions.

For a more in-depth exploration of this comparison, including detailed insights and examples, I have written an article "Comparing Text Similarity Measurement Methods: Sentence Transformers vs. Fuzzy" available at dev.to.

Project Structure

The project consists of the following components:

  • main.py: Python script demonstrating the usage of both Sentence Transformers and Fuzzy.
  • requirements.txt: List of required Python libraries for the project.

Installation

Before running the project, make sure to install the required libraries. You can do this by running the following command:

pip install -r requirements.txt

Usage

  1. Run the main.py script with two text parameters (enclosed in quotes) in sequence to compare text similarity using Sentence Transformers and Fuzzy.
python app "João Matos da Silva" "João Pedro da Silva"
# Sentence Transformers:  0.8613903522491455
# Fuzzy:  0.79
python app "O vasto oceano é belo" "O imenso mar é deslumbrante."
# Sentence Transformers:  0.6285576224327087
# Fuzzy:  0.45
python app "The vast ocean is beautiful" "The immense sea is stunning"
# Sentence Transformers:  0.8006699085235596
# Fuzzy:  0.52
python app "color" "colour"
# Sentence Transformers:  0.973908543586731
# Fuzzy:  0.91
python app "The quick brown fox jumps over the lazy dog" "A fast brown fox leaps over a dozing dog"
# Sentence Transformers:  0.8295611143112183
# Fuzzy:  0.63
python app "John Smith" "Jon Smithe"
# Sentence Transformers:  0.7414223551750183
# Fuzzy:  0.9
  1. Review the output to see the similarity scores for different text pairs.