Type-safe structured data extraction from text using LLMs.
structx
is a powerful Python library for extracting structured data from text
using Large Language Models (LLMs). It dynamically generates type-safe data
models and provides consistent, structured extraction with support for complex
nested data structures.
- 🔄 Dynamic model generation from natural language queries
- 🎯 Automatic schema inference and generation
- 📊 Support for complex nested data structures
- 🔄 Model refinement with natural language instructions
- 📄 Support for unstructured text and document processing
- 🚀 Multi-threaded processing with async support
- 🔌 Support for multiple LLM providers through litellm
- 🔄 Automatic retry mechanism with exponential backoff
pip install structx-llm
pip install structx-llm[docs]
from structx import Extractor
# Initialize extractor
extractor = Extractor.from_litellm(
model="gpt-4o-mini",
api_key="your-api-key"
)
# Extract structured data
result = extractor.extract(
data="System check on 2024-01-15 detected high CPU usage (92%) on server-01.",
query="extract incident date and details"
)
# Access results
print(f"Extracted {result.success_count} items")
print(result.data[0].model_dump_json(indent=2))
For comprehensive documentation, examples, and guides, visit our documentation site.
- Getting Started
- Basic Extraction
- Unstructured Text Processing
- Async Operations
- Multiple Queries
- Custom Models
- API Reference
Check out our example gallery for real-world use cases,
- Structured: CSV, Excel, JSON, Parquet, Feather
- Unstructured: TXT, PDF, DOCX, Markdown, and more
Contributions are welcome! Please read our Contributing Guidelines for details.
This project is licensed under the MIT License - see the LICENSE file for details.