/efficient_data_conversation

Chat with your data while uploading a pdf file and using a local LLM.

Primary LanguagePythonApache License 2.0Apache-2.0

Efficient Data Conversation

Chat with your data while uploading a pdf file and using a local LLM.

Table of Contents

PDF File Structure Support:

  1. Upcoming:Files with well organized tables i.e.: a single row/column ins not divided in multi row/column
  2. Usually Research Paper Structure:
    • Abstract
    • Intorduction
    • Background Works
    • Dataset
    • Methodology
    • Result Analysis
    • Discussion
    • Future Works
    • Conclusion
  3. No Image support for now
  4. Up coming: meta data support

Language Support:

  1. English
  2. Others are loading...

Key Dependencies:

  • Ollama with or without GPU
  • Sentence-transformers
  • Langchain

The models in use:

  1. Attempted Sentence Embedding, chosen on mainly MTEB leaderboard and personal experience:
  2. Attemtped LLMs, chosen based on Mistral-7b's acceptable performence for low resource devices:
    • Mistral-7b: instruct-v0.2-q2_K
    • Mistral-7b: instruct-v0.2-q5_K_M
    • Mistral-7b: instruct-v0.2-q6_K [Currently, In use]

To store models, open a sub-directory inside the "api" directory open a directory.

For example: "lang_models":

plot

Setup Guidelines:

  1. OS tested: Ubuntu>=20.04 LTS
  2. Create a Python>=3.11 environment using conda or virtual env
  3. Use the requirements file to install the dependencies:
pip install -r requirements.txt
  1. Use Ollama docker and Huggingface to pull/download all the models, refer to section: Key Dependencies for details and where to store the models inside your machine.
  2. Set the .env file according to the .env.example structure. Note: For CPU inference, set USE_GPU=0
  3. From the parent directory, to run the system, execute the command below in the termnal:
streamlit run api/app.py

System Support:

  1. Integrated frontend with Streamlit
  2. Up-coming: Separated backend support
  3. Up-coming: Docker support

Credits and special thanks to my friends:

  1. Sharif Ahamed, MSc. in AI, University of Bradford, Bradford, United Kingdom, Email:
    • For advising me through
  2. Soroush Yaghoubi, BSc. In Informatics, Technical University Dortmund, Dortmund, Germany:
    • For the frontend idea and more works in future