TODO List

Ability to upload large files into a database -DONE
While generating the answer, take a reference to multiple uploaded files. The response must use the first 5 most similar content files. -DONE
Read about Chunking (for uploading large amounts of data) -DONE
Read about background tasks (implement background tasks) -DONE

Chunking embedding refers to the process of breaking down large pieces of text into smaller, more manageable chunks before generating embeddings for each chunk. Embeddings are vector representations of text used in natural language processing (NLP) models to capture the semantic meaning of the text.

Setup Instructions

To set up and run the project locally, follow these steps:

Prerequisites Python 3.7+ installed on your system. PostgreSQL installed and running. Virtual Environment (optional but recommended).
Clone the Repository Clone the project repository to your local machine:
```
git clone <repository-url>
cd <repository-directory>
```
Set Up a Virtual Environment (Optional) Create and activate a virtual environment:
```
python3 -m venv venv
source venv/bin/activate 
```

Install Required Packages Install the dependencies listed in requirements.txt:

pip install -r requirements.txt

If requirements.txt is not available, install the dependencies manually:

pip install fastapi aiofiles psycopg2-binary sentence-transformers scikit-learn pyPDF2 transformers python-multipart

Set Up PostgreSQL Database

In order to get postgres up and running. Go to the pdf_management directory and run docker-compose up -d

Create a table named pdf_embeddings with the following structure:

CREATE EXTENSION IF NOT EXISTS vector;

 CREATE TABLE  pdf_embeddings (
ID SERIAL PRIMARY KEY,
filename TEXT UNIQUE,
embeddings VECTOR(384)  -- Adjust the dimensionality based on your embeddings
);

Run the Application Start the FastAPI server:
```
uvicorn main:app --reload
```
- Access the application at http://127.0.0.1:8000.
- API documentation will be available at http://127.0.0.1:8000/docs.
Here is the flow of Application APIs
- Use the /upload/ endpoint to upload PDF files.
- Use the /download/ endpoint to download PDF files.
- Use the /update/ endpoint to update PDF files.
- Use the /delete/ endpoint to delete PDF files.
- Use the /question/ endpoint to question for relevant information.

Bhaveshkadam/Python

TODO List

Setup Instructions

To set up and run the project locally, follow these steps: