PDF Extraction Added to text_extraction Endpoint

Question

PDF Extraction Added to text_extraction Endpoint

Closed this issue a year ago · 1 comments

PDF extraction was added to the text_extraction endpoint in the dev server. This brings a few changes that need to be made in the front end to implement the new version of text_extraction.

The call is now multipart/form-data rather than application/json. If querying a webpage, the URL should be attached to request.form.url. Functionality should remain relatively the same otherwise for URLs.
PDF files should be directly attached to request.files.pdf_file.
Currently, only the first 3000 tokens of either the webpage or pdf are being processed. This is because the ChatGPT call can only process a total of 4000 tokens. This is shared between both the question and response. I am working on working around this limitation.
If the webpage or PDF is not related bipolar disorder or bipolar medications, it should now respond with a message indicating that the webpage or PDF must be related to either of those topics.

Answer 1 · 2023-07-23T17:28:30.000Z

Everything with this should be 👍🏻