Unstructured-IO/unstructured-api

max_characters param not accessible via pipeline_api

macahi opened this issue · 2 comments

macahi commented

Describe the bug

Hello,

I am currently testing the unstructured-api using the chunking_strategy=by_title. I noticed that the max_characters parameter for the chunk_by_title method cannot be passed via pipeline_api:

"new_after_n_chars": m_new_after_n_chars,

As a result, it's not possible to specify values for new_after_n_chars that exceed the default value of max_characters (500).

To Reproduce

curl -X 'POST' 
 'https://api.unstructured.io/general/v0/general' \
 -H 'accept: application/json'  \
 -H 'Content-Type: multipart/form-data' \
 -F 'files=@sample-docs/layout-parser-paper-fast.pdf' \
 -F 'chunking_strategy=by_title' 
 -F 'new_after_n_chars=1500' 

new_after_n_chars has no effect; the maximum chunk size is 500.

If I'm correct, it should be easy to fix.

Hi there, this will certainly be a quick fix. We'll keep you posted!

This is now deployed in the hosted api