Marly allows your agents to extract tables & text from your PDFs, Powerpoints, etc in a structured format making it easy for them to take subsequent actions (database call, API call, creating a chart etc).
- 📄 Give your agents the ability to find whats relevant from large documents, extract it and get it back in JSON with a single API call.
- 🔍 Extract data based on multiple schemas from numerous documents without a vector database or specifying page numbers
- 🔄 Built-in caching to enable instant retrieval of previously extracted schemas, allowing for rapid repeat extractions without reprocessing the original documents.
A schema is a set of key-value pairs describing what needs to be extracted from a particular document (JSON format).
📋 Example Schema
{
"Firm": "The name of the firm",
"Number of Funds": "The number of funds managed by the firm",
"Commitment": "The commitment amount in millions of dollars",
"% of Total Comm": "The percentage of total commitment",
"Exposure (FMV + Unfunded)": "The exposure including fair market value and unfunded commitments in millions of dollars",
"% of Total Exposure": "The percentage of total exposure",
"TVPI": "Total Value to Paid-In multiple",
"Net IRR": "Net Internal Rate of Return as a percentage"
}
💼 Financial Report Analysis | 📊 Customer Feedback Processing | 🔬 Research Assistant | 🧠 Legal Contract Parsing |
Extract key financial metrics from quarterly PDF reports | Categorize feedback from various document types | Process research papers, extracting methodologies and findings | Extract key legal terms and conditions from contracts |
To install the python package, run the following command:
pip install marly
To build the platform from source, run the following command:
docker-compose up --build
-
Navigate to the example scripts/example notebooks folder:
cd example_scripts
or
cd example_notebooks
-
Run the example extraction script:
python azure_example.py
For more detailed information, please refer to our documentation.
We welcome contributions! Please see our Contributing Guide for more details.
This project is licensed under the MIT License.