Divide documents and upload text segments to Pinecone with this python script.
-
Clone the repository:
git clone https://github.com/kirill-markin/split-documents-upload-to-pinecone.git
-
Install the required packages:
cd split-documents-upload-to-pinecone pip install -r requirements.txt
-
Set up environment variables in a
.env
file in the project root based on the.env.example
file:OPENAI_API_KEY=your_openai_api_key PINECONE_API_KEY=your_pinecone_api_key PINECONE_ENVIRONMENT=your_pinecone_environment PINECONE_INDEX_NAME=your_index_name
Replace
your_pinecone_api_key
andyour_pinecone_environment
with your actual values.your_pinecone_environment
— the name of the Pinecone environment you want to use. For exampleus-central1-gcp
.your_index_name
— the name of the index you want to create. You can use any name.
Add documents to the data
folder. It can be a single file or multiple files in inner folders. The script will process all files *.md
in the folder.
Run the script:
python3 main.py
This project is licensed under the MIT License.