Doc Searcher is a Streamlit application that allows users to query a collection of PDF documents and retrieve precise answers to their queries. This application uses LangChain, HuggingFace, and ChromaDB for document loading, text splitting, embedding, and large language model interactions.
- Load and process PDF documents.
- Chunk and persist documents for efficient querying.
- Use a large language model to answer questions based on the content of the PDF documents.
- Streamlit-based user interface for easy querying.
-
Clone the repository:
git clone https://github.com/your-repo/doc-searcher.git cd doc-searcher
-
Install the required Python packages:
pip install -r requirements.txt
-
Ensure you have a HuggingFace API token and set it as an environment variable:
export HUGGINGFACEHUB_API_TOKEN="your_huggingface_api_token"
-
Place your PDF documents in the specified folder (e.g.,
/home/manjeet/Desktop/langchain_tests/consent_forms_cleaned/
). -
Run the Streamlit application:
streamlit run app.py
-
Open your browser and go to the local server address provided by Streamlit (e.g.,
http://localhost:8501
). -
Enter your query in the text input field and press the "Generate" button to get answers based on the content of the PDF documents.
app.py
: Main application file.requirements.txt
: List of required Python packages.README.md
: This file.
Contributions are welcome! Please submit a pull request or open an issue to discuss changes.
This project is licensed under the MIT License. See the LICENSE
file for details.