Elliott-Chong/chatpdf-yt

Error inserting vectors into pinecone

joepds opened this issue · 11 comments

joepds commented

Hi, i got some error when inserting vector to pinecone. This is the error that i got, if someone have the same problem and fix it can you help me to give some solution. Thank you

inserting vectors into pinecone
PineconeBadRequestError: The requested feature 'Namespaces' is not supported by the current index type 'Starter'.
at mapHttpStatusError (webpack-internal:///(rsc)/./node_modules/@pinecone-database/pinecone/dist/errors/http.js:179:20)
at eval (webpack-internal:///(rsc)/./node_modules/@pinecone-database/pinecone/dist/errors/handling.js:170:55)
at step (webpack-internal:///(rsc)/./node_modules/@pinecone-database/pinecone/dist/errors/handling.js:107:23)
at Object.eval [as next] (webpack-internal:///(rsc)/./node_modules/@pinecone-database/pinecone/dist/errors/handling.js:48:20)
at fulfilled (webpack-internal:///(rsc)/./node_modules/@pinecone-database/pinecone/dist/errors/handling.js:11:32)
at process.processTicksAndRejections (node:internal/process/task_queues:95:5) {
cause: undefined
}
⨯ node_modules@pinecone-database\pinecone\dist\errors\http.js (179:19) @ mapHttpStatusError
⨯ unhandledRejection: PineconeBadRequestError: The requested feature 'Namespaces' is not supported by the current index type 'Starter'.
at mapHttpStatusError (webpack-internal:///(rsc)/./node_modules/@pinecone-database/pinecone/dist/errors/http.js:179:20)
at eval (webpack-internal:///(rsc)/./node_modules/@pinecone-database/pinecone/dist/errors/handling.js:170:55)
at step (webpack-internal:///(rsc)/./node_modules/@pinecone-database/pinecone/dist/errors/handling.js:107:23)
at Object.eval [as next] (webpack-internal:///(rsc)/./node_modules/@pinecone-database/pinecone/dist/errors/handling.js:48:20)
at fulfilled (webpack-internal:///(rsc)/./node_modules/@pinecone-database/pinecone/dist/errors/handling.js:11:32)
at process.processTicksAndRejections (node:internal/process/task_queues:95:5) {
cause: undefined
}
null
⨯ node_modules@pinecone-database\pinecone\dist\errors\http.js (179:19) @ mapHttpStatusError
⨯ unhandledRejection: PineconeBadRequestError: The requested feature 'Namespaces' is not supported by the current index type 'Starter'.
at mapHttpStatusError (webpack-internal:///(rsc)/./node_modules/@pinecone-database/pinecone/dist/errors/http.js:179:20)
at eval (webpack-internal:///(rsc)/./node_modules/@pinecone-database/pinecone/dist/errors/handling.js:170:55)
at step (webpack-internal:///(rsc)/./node_modules/@pinecone-database/pinecone/dist/errors/handling.js:107:23)
at Object.eval [as next] (webpack-internal:///(rsc)/./node_modules/@pinecone-database/pinecone/dist/errors/handling.js:48:20)
at fulfilled (webpack-internal:///(rsc)/./node_modules/@pinecone-database/pinecone/dist/errors/handling.js:11:32)
at process.processTicksAndRejections (node:internal/process/task_queues:95:5) {
cause: undefined
}
null

joepds commented

is it because the environment is starter but how do change it like elliot that have the ap-southeast-asia-1?

image

Probably you should pay $70 and I have the same problem

joepds commented

probably, i read pinecone documentation that we can use metada filtering but idk how to implement it to the code

well I'm too facing the issue. and I've found 2 ways to clear this problem....

  1. upgrading the pinecone current plan to standard plan and changing the namespaces from there. ( This is costly )

  2. using other Database similar to pinecone and in this case chroma Database is the best ne and also it's free and opensource
    too. I suggest to use chromaDB. but the problem here is how to use the chroma DB even I don;t know the exact process of
    using it and how to write the code for it. But I'm trying to figure it out of using chromaDB.

    And I'll definitely provide you the code and the process once I found the exact solution.

Hi all, i managed to solve this with 'filtering with metadata' as proposed by pinecone's documentation. What i did:

  1. In loadS3IntoPinecone function of the pinecone.ts file, add fileKey as one of the metadata fields.
import { Pinecone, PineconeRecord } from "@pinecone-database/pinecone";
import { downloadFromS3 } from "./s3-server";
import { PDFLoader } from "langchain/document_loaders/fs/pdf";
import md5 from "md5";
import {
  Document,
  RecursiveCharacterTextSplitter,
} from "@pinecone-database/doc-splitter";
import { getEmbeddings } from "./embeddings";
import { convertToAscii } from "./utils";

export const getPineconeClient = () => {
  return new Pinecone({
    environment: process.env.PINECONE_ENVIRONMENT!,
    apiKey: process.env.PINECONE_API_KEY!,
  });
};

type PDFPage = {
  pageContent: string;
  metadata: {
    loc: { pageNumber: number; fileKey: string };
  };
};

export async function loadS3IntoPinecone(fileKey: string) {
  console.log("downloading s3 into file system");
  const file_name = await downloadFromS3(fileKey);
  if (!file_name) throw new Error("could not download file from s3");
  const loader = new PDFLoader(file_name);
  console.log(loader, "loader");
  const pages = (await loader.load()) as PDFPage[];
  console.log(pages, "pages");
  const documents = await Promise.all(
    pages.map((page) => prepareDocument(page, fileKey))
  );

  const vectors = await Promise.all(
    documents.flat().map((doc) => embedDocument(doc, fileKey))
  );

  const client = await getPineconeClient();
  const pineconeIndex = await client.index("chatpdf");

  console.log("Inserting vectors into pinecone");
  const request = vectors;
  await pineconeIndex.upsert(request);
  console.log("Inserted vectors into pinecone");

  return documents[0];
}

async function embedDocument(doc: Document, fileKey: string) {
  try {
    const embeddings = await getEmbeddings(doc.pageContent);
    const hash = md5(doc.pageContent);

    return {
      id: hash,
      values: embeddings,
      metadata: {
        text: doc.metadata.text,
        pageNumber: doc.metadata.pageNumber,
        fileKey,
      },
    } as PineconeRecord;
  } catch (error) {
    console.log("error embedding document", error);
    throw error;
  }
}

export const truncateStringByBytes = (str: string, bytes: number) => {
  const enc = new TextEncoder();
  return new TextDecoder("utf-8").decode(enc.encode(str).slice(0, bytes));
};

async function prepareDocument(page: PDFPage, fileKey: string) {
  console.log(page, "page in preparedoc");
  let { pageContent, metadata } = page;
  pageContent = pageContent.replace(/\n/g, "");
  // split the docs
  const splitter = new RecursiveCharacterTextSplitter();
  const docs = await splitter.splitDocuments([
    new Document({
      pageContent,
      metadata: {
        pageNumber: metadata.loc.pageNumber,
        text: truncateStringByBytes(pageContent, 36000),
        fileKey,
      },
    }),
  ]);
  return docs;
}


  1. In context.js, use query() instead of namespaces.
export async function getMatchesFromEmbeddings(
  embeddings: number[],
  fileKey: string
) {
  try {
    const client = new Pinecone({
      environment: process.env.PINECONE_ENVIRONMENT!,
      apiKey: process.env.PINECONE_API_KEY!,
    });
    const pineconeIndex = await client.index("chatpdf");
    const queryResponse = await pineconeIndex.query({
      vector: embeddings,
      filter: { fileKey: { $eq: fileKey } },
      topK: 5,
      includeMetadata: true,
    });

    return queryResponse.matches || [];
  } catch (error) {
    console.log("error querying embeddings", error);
    throw error;
  }
}



@tzechong94 It's really working man. I literally spent so much time to figure it out thank you very much 😁😁

joepds commented

@tzechong94 Thank you for the code! It works perfectly.

After the vectors inserted into pinecone the page is being redirected but i can't understand that the chat is pushed into database or not.
The URL is like http://localhost:3000/chat/[object%20Object] instead of http://localhost:3000/chat/1 like in video tutorial of Elliott-Chong.

I want to clear my self that the chat is being pushed into database or not and why the URL is different for me ?

joepds commented

@CodeOfMugiwara have you check to drizzle? try access drizlle use this 127.0.0.1:4983 if the link from drizzle studio cant be access

@joepds Thank you for the url you've provided the schema is creating succesfully and chat id's are also generated perfectly but I am still facing the url issue. it's being redirected as http://localhost:3000/chat/[object%20Object] instead of 'http://localhost:3000/chat/1' How do I solve this issue

@tzechong94 what should be used instead of pdf loader as it is deprecated now