This repository contains the code for an interactive chatbot application that allows users to upload PDF files, extract text from PDFs, parse text from URLs, and engage in a chat conversation with the chatbot. This task is provided by Upadpro Software & Services Pvt. Ltd.
app.Streamlit.-.Google.Chrome.2023-11-17.13-29-56.mp4
-
PDF Text Extraction: Users can upload PDF files, and the application extracts text from the uploaded PDF using PyPDF2.
-
URL Text Parsing: Users can input a URL, and the application parses the text content from the provided URL using BeautifulSoup.
-
Chatbot Integration: The application incorporates a chatbot powered by LlamaIndex. Users can interact with the chatbot and ask questions related to the extracted text.
-
Multi-App Structure: The project is organized using a multi-app structure, allowing users to choose between different functionalities (PDF, URL, ChatBot) using Streamlit.
-
extract_text.py: Contains the
PDFTextExtractor
class responsible for extracting text from PDFs and saving it to a text file. -
chatbot.py: Implements the chatbot functionality using LlamaIndex and Streamlit's chat components.
-
pdfinput.py: Defines the Streamlit app for handling PDF file uploads, text extraction, and interaction with the chatbot.
-
urlinput.py: Implements the Streamlit app for URL input, text extraction from URLs, and interaction with the chatbot.
-
app.py: Orchestrates the multi-app structure, allowing users to choose between PDF, URL, and ChatBot functionalities.
-
Data: Contain Sample pdf files for pdfinput.py
-
Clone the repository to your local machine:
git clone https://github.com/Sarthak-1408/Upadpro-AI-Developer-Task.git cd Upadpro-AI-Developer-Task
-
Install the required dependencies:
pip install -r requirements.txt
-
Run the application:
streamlit run app.py
-
Open your web browser and navigate to the provided local URL.
- streamlit
- PyPDF2
- beautifulsoup4
- llama_index