/langchang_book_chat

Simple app that allows you to have a conversation with a book that you pass to it.

Primary LanguagePython

# App.py

This code enables users to interactively ask questions about PDF files stored in specific directories by indexing the file's content and generating a chat-like interface.

## Requirements

- You must use Poetry for virtual environment and package management
- Python 3.6 or newer

## Installation

Set up and install the required packages in a virtual environment using Poetry:
```bash
poetry install
```

## Running the application

To run this application:
```bash
poetry run python app.py --path PATH_TO_YOUR_DIRECTORY
```


Argument `--path` is optional, and if not provided, the default directory "docs/" will be used. Replace "PATH_TO_YOUR_DIRECTORY" with the path where your PDF files are stored.

## How it works

The application performs the following steps:

1. Parses the user-given (or default) directory and obtains a list of PDF files.
2. Loads each PDF file and splits it into chunks for indexing.
3. Indexes each document in memory, creating a searchable representation using OpenAIEmbeddings for semantic searching.
4. Asks a language model (GPT) to provide a brief description of each document.
5. Initializes an agent, using the indexed documents and the document description, to enable interactive querying.

It provides a REPL-like interface where you can ask questions about the available documents. The agent retrieves the best-matching answer from the indexed documents.

Press `Ctrl+C` to exit the tool.

## Application Structure

The code organizes the functionality into the following functions:

- `main()`: Entry point of the application
- `get_files(directory_path)`: Fetches PDF filenames from the directory
- `load_document(file_name)`: Loads the specified PDF file and splits it into chunks
- `index_document(documents, file_name, embeddings_api)`: Indexes the documents, creating a searchable representation
- `get_document_description(llm, vectorstore)`: Retrieves a document description summary from GPT