gpthistory
is a Python package that provides a powerful tool for indexing and searching ChatGPT conversations. This package allows users to build an index from chat data files, generate embeddings for efficient searching, and perform searches to find relevant conversations based on keywords.
You can easily install gpthistory
via pip:
pip install gpthistory
# Installing from souce
git clone git@github.com:sarchak/gpthistory.git
cd gpthistory
pip install -e .
Unfortunately, there is no way to programmatically get the conversation history. As as work around export the conversations by going to the Setting Section. One you get the email from OpenAI download and unzip the folder which contains the conversations.json file
We use openai embeddings to find semantic similarity. Hence before building index. Make sure you set the OpenAI Key on the shell.
export OPENAI_API_KEY='your open ai key'
The build-index
command allows you to build an index from your chat data files. The tool extracts relevant text parts from each chat entry and stores them in the index along with their associated chat IDs and section IDs.
To build an index, run:
gpthistory build-index --file /path/to/conversations.json
Replace /path/to/conversations.json
with the path to your chat data file in JSON format.
You can optionally add a rate-limiting mechanism to control the frequency of API calls. The --rate-limit
option lets you specify the sleep time in seconds between API calls, which is useful to prevent hitting rate limits on the OpenAI API.
Here's how to use it:
gpthistory build-index --rate-limit 0.002 --file /path/to/conversations.json
Replace 0.002
with the desired sleep time in seconds.
Once you have built the index, you can perform searches using the search
command. The tool takes a keyword as input and returns the top matching conversations from the index and also the conversation history link so that you can directly go to that link.
To search for a keyword, run:
gpthistory search "your_keyword"
Replace "your_keyword"
with the keyword you want to search for.
The search algorithm uses embeddings to efficiently match the keyword against the indexed text parts. It calculates dot product scores between the query embedding and all embeddings in the index. Conversations with dot product scores above a certain threshold are considered as top matches.
# Build the index from conversations.json
gpthistory build-index --file conversations.json
# Search for conversations related to "chatbot"
gpthistory search "chatbot"
gpthistory
is distributed under the MIT License. See LICENSE for more information.
Your Name Twitter: shrikar84
We welcome feedback and contributions to improve gpthistory
. If you encounter any issues, have suggestions, or want to contribute, please create an issue or submit a pull request on our GitHub repository.
Please note that this tool is intended for research and educational purposes. Make sure you have proper permissions and adhere to the usage terms and conditions of the data sources you analyze with this tool.