Piah automatically parse the data from PDF's or texts based only in the dataclass that you provide and return the same dataclass fullfilled with the values. Piah is based in the OxyParser
Table of Contents
pip install piah
first, set your key in the environment variables like:
import os
os.environ["OPENAI_API_KEY"] = "your-api-key"
or set in a .env
file and then just use piah
, e.g:
from piah import Piah
from dataclasses import dataclass
@dataclass
class Person:
name: str
age: int
parser = Piah("gpt-3.5-turbo")
result = parser.parse("Hello Iam python and I have 33 years old", Person)
to parse PDF's:
result = parser.parse("example.pdf", Person)
#or
result = parser.parse(Path("example.pdf"), Person)
piah
uses LiteLLM, so consult the LiteLLM docs to check if the desired Model is supported.
- Write docstrings
- Improve allowed types
- Improve system prompt
Seems that piah
don't pass every time in the test, because the LLM don't parse
correctly every time large PDF's
piah
is distributed under the terms of the MIT license.