max_characters param not accessible via pipeline_api
macahi opened this issue · 2 comments
macahi commented
Describe the bug
Hello,
I am currently testing the unstructured-api using the chunking_strategy=by_title
. I noticed that the max_characters
parameter for the chunk_by_title
method cannot be passed via pipeline_api:
As a result, it's not possible to specify values for
new_after_n_chars
that exceed the default value of max_characters
(500).
To Reproduce
curl -X 'POST'
'https://api.unstructured.io/general/v0/general' \
-H 'accept: application/json' \
-H 'Content-Type: multipart/form-data' \
-F 'files=@sample-docs/layout-parser-paper-fast.pdf' \
-F 'chunking_strategy=by_title'
-F 'new_after_n_chars=1500'
new_after_n_chars
has no effect; the maximum chunk size is 500.
If I'm correct, it should be easy to fix.
awalker4 commented
Hi there, this will certainly be a quick fix. We'll keep you posted!
awalker4 commented
This is now deployed in the hosted api