Unstructured-IO/unstructured-api

local docker setup of unstructured api asks for API key with 401 errors

ahmedrehman opened this issue · 11 comments

Describe the bug
im trying the unstructured api with a local docker like in documnetation : https://js.langchain.com/docs/integrations/document_loaders/file_loaders/unstructured

To Reproduce
docker run -p 8000:8000 -d --rm --name unstructured-api quay.io/unstructured-io/unstructured-api:latest --port 8000 --host 0.0.0.0
try to use it it will always answer 401 becaue of malformed API Key, and there is no documentation how to get or create that key locally,
As i understand it should not ask for an api key f i dont specify the environment variable

Environment:
docker on windows

Are you setting apiUrl to point to your local instance?

I tried now, it still asks for apikey

docker run -p 8000:8000 -d --rm --name unstructured-api quay.io/unstructured-io/unstructured-api:latest --port 8000 --host 0.0.0.0

import { UnstructuredDirectoryLoader } from "langchain/document_loaders/fs/unstructured";

const options = {
apiKey: "",
apiUrl: "http://localhost:8000"
};

const loader = new UnstructuredDirectoryLoader(
"woertli"
, options
);
const docs = await loader.load();

Can you share the exact error you're seeing? Also, you can omit apiKey from options since you're not providing one

I tried also without apikey

langchainnodeexample@1.0.0 start
node test.js

file:///C:/ahmed/wrk/tmpWork/2024/proj/ai/langchainNode/node_modules/langchain/dist/document_loaders/fs/unstructured.js:185
throw new Error(Failed to partition file ${this.filePath} with error ${response.status} and message ${await response.text()});
^

Error: Failed to partition file C:\ahmed\wrk\tmpWork\2024\proj\ai\langchainNode\woertli\wortli1pdf.pdf with error 401 and message {"detail":"API key is malformed, please type the API key correctly in the header."}
at UnstructuredLoader._partition (file:///C:/ahmed/wrk/tmpWork/2024/proj/ai/langchainNode/node_modules/langchain/dist/document_loaders/fs/unstructured.js:185:19)
at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
at async UnstructuredLoader.load (file:///C:/ahmed/wrk/tmpWork/2024/proj/ai/langchainNode/node_modules/langchain/dist/document_loaders/fs/unstructured.js:194:26)
at async UnstructuredDirectoryLoader.load (file:///C:/ahmed/wrk/tmpWork/2024/proj/ai/langchainNode/node_modules/langchain/dist/document_loaders/fs/directory.js:94:40)
at async file:///C:/ahmed/wrk/tmpWork/2024/proj/ai/langchainNode/test.js:24:14

Node.js v20.10.0

I suspect that you are still not hitting your Docker container for some reason because that error response message does not exist in the repo. Make sure that you are seeing the request come in to your container using docker logs if possible.

Yes you are right
const options = {
apiKey: "",
apiUrl: "http://localhost:8000"
};
this works, api problem is gone

Thanks @omikader for the assist here!

I now one step further
i get the response
{"detail":"Not Found"} on localhost:8000 dont know the api url, but it somehow seems to do something:

if i give a wrong filepath i get different errors about file not there.
but if i provide a correct file path, then i get detail not found

Error: Failed to partition file /ahmed/wrk/tmpWork/2024/proj/ai/langchainNode/exampledata/trainingdata/examplestoUse.csv with error 404 and message {"detail":"Not Found"}
at UnstructuredLoader._partition (file:///C:/ahmed/wrk/tmpWork/2024/proj/ai/langchainNode/node_modules/langchain/dist/document_loaders/fs/unstructured.js:185:19)
at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
at async UnstructuredLoader.load (file:///C:/ahmed/wrk/tmpWork/2024/proj/ai/langchainNode/node_modules/langchain/dist/document_loaders/fs/unstructured.js:194:26)
at async file:///C:/ahmed/wrk/tmpWork/2024/proj/ai/langchainNode/test.js:27:15

You're getting a 404 because you're not hitting the right endpoint. The apiUrl needs the path /general/v0/general too.

const options = {
  apiUrl: "http://localhost:8000/general/v0/general"
};

Yes great help now it works, please update the documentations they are bit spare

i found this installation instructions, which works nicely, i had really spent lots of time trying to get unstructured running for langchain ai examples
https://www.youtube.com/watch?app=desktop&v=svzd5d1LXGk from echohive