/unstructured-python-client

A Python client for the Unstructured hosted API

Primary LanguagePythonMIT LicenseMIT

Python SDK for the Unstructured API

This is a Python client for the Unstructured API.

SDK Installation

pip install unstructured-client

Usage

Only the files parameter is required. See the general partition page for all available parameters. 

from unstructured_client import UnstructuredClient
from unstructured_client.models import shared
from unstructured_client.models.errors import SDKError

s = UnstructuredClient(api_key_auth="YOUR_API_KEY")

filename = "sample-docs/layout-parser-paper-fast.pdf"

with open(filename, "rb") as f:
    # Note that this currently only supports a single file
    files=shared.Files(
        content=f.read(),
        file_name=filename,
	)

req = shared.PartitionParameters(
    files=files,
    # Other partition params
    strategy='ocr_only',
    languages=["eng"],
)

try:
    resp = s.general.partition(req)
    print(resp.elements[0])
except SDKError as e:
    print(e)

# {
# 'type': 'UncategorizedText', 
# 'element_id': 'fc550084fda1e008e07a0356894f5816', 
# 'metadata': {
#   'filename': 'layout-parser-paper-fast.pdf', 
#   'filetype': 'application/pdf', 
#   'languages': ['eng'], 
#   'page_number': 1
#   }
# }

Change the base URL

If you are self hosting the API, or developing locally, you can change the server URL when setting up the client.

# Using a local server
s = unstructured_client.UnstructuredClient(
    server_url="http://localhost:8000",
    api_key_auth=api_key,
)

# Using your own server
s = unstructured_client.UnstructuredClient(
    server_url="https://your-server",
    api_key_auth=api_key,
)

Custom HTTP Client

The Python SDK makes API calls using the requests HTTP library. In order to provide a convenient way to configure timeouts, cookies, proxies, custom headers, and other low-level configuration, you can initialize the SDK client with a custom requests.Session object.

For example, you could specify a header for every request that this sdk makes as follows:

import unstructured_client
import requests

http_client = requests.Session()
http_client.headers.update({'x-custom-header': 'someValue'})
s = unstructured_client.UnstructuredClient(client: http_client)

Maturity

This SDK is in beta, and there may be breaking changes between versions without a major version update. Therefore, we recommend pinning usage to a specific package version. This way, you can install the same version each time without breaking changes unless you are intentionally looking for the latest version.

Contributions

While we value open-source contributions to this SDK, this library is generated programmatically. Feel free to open a PR or a Github issue as a proof of concept and we'll do our best to include it in a future release!

SDK Created by Speakeasy