/structx

Type-safe structured data extraction from text using LLMs.

Primary LanguagePythonMIT LicenseMIT

structx

Type-safe structured data extraction from text using LLMs.

Documentation PyPI GitHub Actions

structx is a powerful Python library for extracting structured data from text using Large Language Models (LLMs). It dynamically generates type-safe data models and provides consistent, structured extraction with support for complex nested data structures.

Features

  • 🔄 Dynamic model generation from natural language queries
  • 🎯 Automatic schema inference and generation
  • 📊 Support for complex nested data structures
  • 🔄 Model refinement with natural language instructions
  • 📄 Support for unstructured text and document processing
  • 🚀 Multi-threaded processing with async support
  • 🔌 Support for multiple LLM providers through litellm
  • 🔄 Automatic retry mechanism with exponential backoff

Installation

pip install structx-llm

For document processing support

pip install structx-llm[docs]

Quick Start

from structx import Extractor

# Initialize extractor
extractor = Extractor.from_litellm(
    model="gpt-4o-mini",
    api_key="your-api-key"
)

# Extract structured data
result = extractor.extract(
    data="System check on 2024-01-15 detected high CPU usage (92%) on server-01.",
    query="extract incident date and details"
)

# Access results
print(f"Extracted {result.success_count} items")
print(result.data[0].model_dump_json(indent=2))

Documentation

For comprehensive documentation, examples, and guides, visit our documentation site.

Examples

Check out our example gallery for real-world use cases,

Supported File Formats

  • Structured: CSV, Excel, JSON, Parquet, Feather
  • Unstructured: TXT, PDF, DOCX, Markdown, and more

Contributing

Contributions are welcome! Please read our Contributing Guidelines for details.

License

This project is licensed under the MIT License - see the LICENSE file for details.