Ability to accept gzip compressed files
cragwolfe opened this issue · 0 comments
cragwolfe commented
Summary
Currently, when multiple files are passed to an unstructured-api-tools (auto-generated) Pipeline API, the files are presumed to be uncompressed and the files and their content types are passed along to pipeline_api
.
However, the consumer of the API should have the ability to submit gzip compressed files as well. See the spec for details. To be clear, this issue is about gracefully handling (potentially) gzip'ed files in the FastAPI interface and passing uncompressed files or the uncompressed text content of files to pipeline_api
.
Note: Ideally #104 is at least partially completed first, but this is not a hard blocker.
Definition of Done
- gzipped files may be submitted to the API in either the
text_files
orfiles
form parameters, per the spec. - Unittests are added, including for a request that includes an input with both text_files and files compressed and uncompressed files.
- Unittests show the ability to infer the
file_content_type
to pass topipeline_api
ifgz_uncompressed_content_type
is not provided. - Test instructions demonstrate compressed files being appropriated handled in a locally running pipeline-sec-filings API, including mixed compressed and uncompressed files submitted in the same request.
- Test instructions demonstrate compressed files being appropriated handled in a locally running pipeline API that accepts
files
(in contrast to thetext_files
input in thesections
API ofpipeline-sec-filings
) API, including mixed compressed and uncompressed files submitted in the same request. - Test instructions demonstrate compressed files being appropriated handled in a locally running pipeline API that accepts a file OR a text file(e.g.
def pipeline_api(text, file, ...)
) , including mixed compressed and uncompressed files submitted in the same request.