/siteseeing_backend

Web Content Analysis and QA Backend Service

Primary LanguagePython

Web Content Analysis and QA Backend Service

Overview

This Python-based backend service, deployable on platforms like Heroku, offers a comprehensive suite of features for web content analysis and interaction. Utilizing Flask, it integrates various functionalities including web scraping, question-answering sessions, content summarization, keyword extraction, and categorization, all powered by OpenAI's advanced AI models.

Features

  1. Web Scraping and Analysis: Extract and analyze content from specified websites, providing a deep dive into the data and metadata of web pages.

  2. Question-Answering Sessions: Engage in interactive QA sessions with websites, allowing users to ask questions and receive AI-generated answers based on the website's content.

  3. Content Summarization: Utilize AI to summarize the content of web pages, providing quick and efficient insights into lengthy articles or documents.

  4. Keyword and Category Identification: Automatically extract key phrases and categorize content from websites, aiding in content classification and SEO optimization.

  5. Database Integration: Leverages PostgreSQL for data management, providing robust and scalable storage solutions.

  6. Asynchronous Processing: Capable of handling long-running tasks like web scraping and AI analysis in the background, ensuring a responsive user experience.

Server Routes

  • /execute: Initiates the scraping and analysis of specified websites.
  • /sites_count: Returns the total count of processed sites.
  • /qa/<site_link>: Retrieves saved Q&A pairs for a given site.
  • /qa: Submits a new question for a specific site and gets an AI-generated answer.
  • /narrow/package: Processes batch questions for enhanced analysis.
  • /screens/<site_link>: Fetches screenshots associated with a site.
  • /screens: Associates a new screenshot with a site.
  • /sites: Retrieves a paginated list of processed sites with sorting options.
  • /sites/by-keywords: Filters sites based on specified keywords.
  • /sites/by-categories: Filters sites based on specified categories.
  • /sites/<site_id>: Retrieves details of a site based on its ID.
  • /sites/link/<site_link>: Retrieves details of a site based on its link.
  • /sites/search: Searches sites based on provided terms.

Deployment and Usage

Designed for ease of deployment, this service is containerized and ready for deployment on cloud platforms like Heroku. It is suitable for developers, SEO specialists, and content creators who require an automated, AI-powered tool for web content analysis and interaction.

Contribution and Collaboration

Contributors are welcome to enhance the functionalities, fix bugs, or improve the performance. The project is open for collaboration on GitHub, encouraging community involvement in the development process.