Select a PDF file as context
Closed this issue · 6 comments
Right now, one can use the GPT-4o API to upload a PDF file and ask questions about it. Here's an example of how to do this with Python.
It would be great if gptel
could allow users to select a PDF as context and have the AI explain a selected region in pdf-view-mode
, similar to what gptel-quick
does.
Is it currently possible? if not, what functionalities need to be implemented? I am happy to do some contributions. If you believe this isn't the direction for this package, I can create a new package instead, and it would be really helpful if you could give me some guidance on how to go about that or point me to which part of the gptel
code I should check out.
There are three versions of this feature:
-
Select some text in a
pdf-view
buffer and add it to gptel's context, as text. This is easy to add to gptel. Since you've seen the implementation ingptel-quick
, we can add it the same way togptel
. -
Send the current PDF view (i.e. current page), but as an image to a model that supports images. Also easy to add to gptel.
-
Use OpenAI assistants API to set up a session and include files. OpenAI will then use these files as part of a RAG pipeline. This is what the Python code in your example does. I think this is out of scope for gptel. However, I have plans to make it easy to set up RAG pipelines with gptel. It will probably be an add-on package, and will support fully local RAG, along with the ability to plug in other RAG approaches like those provided by OpenAI and Gemini. This is a pretty extensive project though, and I don't have the time to work on it for a while. This package will do quite a bit more than what you're looking for, but let me know if you're interested in authoring it nevertheless.
If you want to add 1 or 2 to gptel, PRs are welcome. To begin with I'd read through the file gptel-context.el
, focusing on the functions gptel-add
, gptel-context--collect
and the variable gptel-context--alist
, which holds the context chunks.
As a side note, you can already select a PDF file as context if you use the Gemini models. However, this is not RAG -- the entire PDF file is parsed with each request, so this is probably best used for one-off requests or very short conversations.
Oh what I want is the third option since I mostly need AI to help me understand academic papers and I have a lot of questions.
Sadly, I don't think I have enough time to write and to maintain a standalone package if it is the same level as gptel
. But do you already have any thoughts about the third option? I might start with some simple code that meets my personal needs and see if it can grow into a package later.
For the third option, I think you'll need a different tool. I haven't kept up with the state of things, but perhaps something like Khoj? There are many more like it, I think.
If the tool provides an HTTP API, it might be possible to continue to use gptel in Emacs to interact with it.
I looked into Khoj
, and it seems like they just convert PDFs into text without using any assistants or sessions.
If the tool provides an HTTP API, it might be possible to continue to use gptel in Emacs to interact with it.
What if I set up a small server to manage PDF files and sessions, and then I use gptel to communicate with that server?
By the way, it seems like private-gpt
also supports PDF uploading (I haven't yet check whether it uses session and RAG under the hood). Since gptel
can interact with private-gpt
, I am curious whether it is possible to ask questions regarding PDF files via private-gpt
and gptel
?
I am curious whether it is possible to ask questions regarding PDF files via private-gpt and gptel?
It might be possible, privategpt does do some kind of RAG and cites its sources.
I am moving this to a discussion since no change is planned on the gptel side for now.