Similarity Score Analyzer is a Python application that analyzes the similarity of webpage content to a given query. The application leverages modern web scraping techniques, natural language processing, and machine learning models to provide detailed similarity analysis and optimization suggestions.
- Web Scraping: Extracts webpage content using the
requests
andBeautifulSoup
libraries. - Text Preprocessing: Cleans and preprocesses text data for analysis.
- Embedding Generation: Generates text embeddings using TensorFlow Hub models.
- Similarity Scoring: Computes cosine similarity scores between the query and webpage sections.
- Google Cloud NLP Integration: Analyzes sentiment and entity recognition using Google Cloud Natural Language API.
- Heatmap Visualization: Displays similarity scores in a heatmap for easy visualization.
- Optimization Suggestions: Provides suggestions to improve content relevance based on similarity scores.
To install and set up the Similarity Score Analyzer, follow these steps:
- Anaconda or Miniconda installed
- Python 3.8 or higher
- A Google Cloud account with access to the Natural Language API
-
Clone the Repository:
git clone https://github.com/yourusername/similarity-score-analyzer.git cd similarity-score-analyzer
-
Create a Conda Environment:
conda create -n similarity_analyzer_env python=3.9
-
Activate the Conda Environment:
conda activate similarity_analyzer_env
-
Install the Package and Dependencies:
pip install .
This will install the Similarity Score Analyzer package along with all required dependencies listed in
setup.py
. -
Set Up Environment Variables:
Create a
.env
file in the root directory of the project and add the following environment variables:echo "GOOGLE_CLOUD_NLP_API_KEY=your_actual_api_key_here" > .env echo "GEMINI_API_KEY=your_actual_gemini_api_key_here" >> .env echo "MODEL_NAME=gemini-1.5-flash-exp-0827" >> .env
Replace
your_actual_api_key_here
andyour_actual_gemini_api_key_here
with your actual API keys. -
Run the Application:
Start the application using Streamlit:
streamlit run similarity_analyzer/main.py
Once the application is running, you can use the web interface to:
- Enter the Target URL: Provide the URL of the webpage you want to analyze.
- Enter the Query: Input the query you want to optimize the webpage content for.
- Select an Embedding Model: Choose from available embedding models like Universal Sentence Encoder or Gemini models.
- Analyze: Click the "Analyze" button to start the analysis.
- View Results: The application will display the overall similarity score, section-wise heatmap, optimization suggestions, and Google Cloud NLP analysis (sentiment and entities).
Contributions are welcome! Please fork this repository, make your changes, and submit a pull request.
To run the unit tests included with the project:
python -m unittest discover tests
This project is licensed under the MIT License. See the LICENSE file for more details.
For any inquiries or feedback, please open an issue.
To activate the conda environment created during the setup process:
conda activate similarity_analyzer_env
Once the conda environment is activated, your terminal prompt should change to indicate that you are now working within the environment. You can then proceed with the installation of dependencies and running the application.