Standalone Knowledge Retrieval API Server to be used with Rubra
- Run in development mode (hot-reloading):
make run-dev
(Requiresdocker
andcompose
) - Dependency Management:
uv
- Linting & Formatting:
ruff
Currently, the following file types are supported for ingestion via llama-index' SimpleDirectoryReader
interface:
.csv
- comma-separated values.docx
- Microsoft Word.epub
- EPUB ebook format.hwp
- Hangul Word Processor.ipynb
- Jupyter Notebook.jpeg
, .jpg - JPEG image.mbox
- MBOX email archive.md
- Markdown.mp3, .mp4
- audio and video.pdf
- Portable Document Format.png
- Portable Network Graphics.ppt, .pptm, .pptx
- Microsoft PowerPoint
You can use the GPTScript example in the examples/ directory to test the ingestion and querying parts of the API. The GPTScript will do the following:
- Ingest the llama2 Paper located as
examples/data/llama2.pdf
(only if it hasn't been ingested before) - Query the Dataset to tell us something about the topics "Truthfulness, Toxicity, and Bias"
The returned response should contain a reference to the source page.
Just run this from the repository root:
make run-dev # if you haven't already
# Create the dataset
curl -X 'POST' \
'http://localhost:8000/datasets/create' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"name": "llama2",
"embed_dim": 0
}'
# Run the GPTScript example
gptscript examples/example.gpt