PDF Extraction Added to text_extraction Endpoint
Closed this issue · 1 comments
ryanrrogers commented
PDF extraction was added to the text_extraction endpoint in the dev server. This brings a few changes that need to be made in the front end to implement the new version of text_extraction.
- The call is now multipart/form-data rather than application/json. If querying a webpage, the URL should be attached to request.form.url. Functionality should remain relatively the same otherwise for URLs.
- PDF files should be directly attached to request.files.pdf_file.
- Currently, only the first 3000 tokens of either the webpage or pdf are being processed. This is because the ChatGPT call can only process a total of 4000 tokens. This is shared between both the question and response. I am working on working around this limitation.
- If the webpage or PDF is not related bipolar disorder or bipolar medications, it should now respond with a message indicating that the webpage or PDF must be related to either of those topics.
ryanrrogers commented
Everything with this should be 👍🏻