Search Optimization Project

Overview

The Search Optimization project is designed to enhance the relevance and clarity of search engine results for specific user intents. This project involves data scraping, data preprocessing, and utilizing the GPT-3.5 Davinci model from OpenAI to evaluate and improve search results.

Components

  • Dataset: The project relies on a dataset containing user queries and associated intents. This dataset serves as the foundation for generating search queries.

  • Web Scraping: Bing and Google search engines are scraped to retrieve search results for each user intent. These results are stored in a MySQL database for further analysis.

  • Data Preprocessing: Before sending the search results for evaluation, the project performs data preprocessing. This includes tasks like removing stop words and lemmatization to clean the text data.

  • GPT-3.5 Davinci Model: The OpenAI GPT-3.5 Davinci model is used via the OpenAI API. It analyzes search results for each intent, assigning relevance scores (1-5), suggesting improvements, and identifying the best search engine (Google or Bing) for that intent.

    response = openai.Completion.create(
        engine="text-davinci-003",
        prompt=prompt,
        max_tokens=200,
        temperature=0.2,
    )

Prompt example: image

  • API: The project exposes an API using FastAPI, allowing users to interact with the system. Users can submit intents, retrieve search results, and receive optimized results with relevance scores.

  • Dependencies: Poetry is used for managing project dependencies, ensuring a clean and reproducible environment.

  • Backend: The backend of the project is developed in Python, leveraging libraries like SQLAlchemy for database interaction and SpaCy for data preprocessing.

Installation

  1. Clone the repository:

    git clone https://github.com/yourusername/search-optimization.git
    cd search-optimization
    
  2. Install project dependencies using poetry:

poetry add
  1. Set up a MySQL database for storing search results and update the database configuration in the project settings.

  2. Start the FastAPI server:

poetry run uvicorn main:app --host 0.0.0.0 --port 8000 --reload
  1. Access the FastAPI documentation at http://localhost:8000/docs to explore available API endpoints.

Use the API to submit intents, retrieve search results, and receive optimized results with relevance scores and suggestions:

image

image

image

image