This repo is public access to what I use for my youtube video daily AI paper breakdowns. https://youtube.com/playlist?list=PLPefVKO3tDxP7iFzaSOkOZnXQ4Bkhi9YB&si=J0Rmcmy-oVyAZI7I
It's a heavily edited version of David Shapiro's original repo that's been suited to my needs. If I was smarter I would've forked off of his but too late now. https://github.com/daveshap/Quickly_Extract_Science_Papers
I don't think anybody will really find a majority of these files useful but who knows. The one I've been asked to share is generate_multiple_prompts.py which you could find a similar version of on Dave's repo. The only real difference with mine is that I adjusted it to use the longer 16k context window version of GPT-3.5 when that came out.
generate_multiple_reports.py
- this will consume all PDFs in thepdfs-to-summarize/
folder and use OpenAI's API to generate summaries in thetxt-summaries/
folder. This is helpful for bulk processing such as for literature reviews.concatenate.py
- this will turn all txt summary files that have been copy & pasted fromtxt-summaries/
totxt-summaries/to-be-concatenated
into pdfs that I think are super useful for sharing. Basically it prepends the summary to the beginning of the original pdf. I often want to share scientific articles with friends but they usually don't want to read the whole thing, so giving them a version with a summary in the beginning is super useful.send-to-obsidian.py
- this will take any txt files that have been copy & pasted fromtxt-summaries/
totxt-summaries/send-to-Obsidian
and send them and their pdf versions into your obsidian vault. You need to specify the location of your obsidian vault inconfig.py
in order for it to work.newsletter.py
- creates the actual newsletter that I publish daily to https://evintunador.substack.com?utm_source=navbar&utm_medium=web&r=1ixdk1 by concatenating all the summaries, and then having the chatGPT API do a little meta-summary intro to the newsletter. Saves tonewsletter.txt
which is what I copy & paste into substacktimestamps.py
- a script that generates youtube chapter timestamps based on the pdfs that have been summarized. Hit a configurable hotkey to (I use `) to indicate that a new yt chapter should start, andesc
to end the script. Createstimestamps.txt
which is what I copy & paste into my yt description.cleanup.py
- Deletes all of the files that are generated by all the other scripts. I'd recommend running this after you download the repo since I've left it populated with a bunch of example txt and pdf files for example purposesconfig.py
- Where you can change a couple settings if you'd like.- I've left in a bunch of example pdf and txt files so that you can see how they work, and also as an excuse to share a few shitty half-finished papers that I've either completely abandoned or put on hold for various reasons. Feel free to either use
cleanup.py
to remove them or delete them manually- everything in
pdfs-to-summarize/
,txt-summaries
, andconcatenated-summaries
concat_summaries.txt
,newsletter.txt
, andtimestamps.txt
- everything in
- Clone the repository to your local machine.
- Install the required Python packages by running
pip install -r requirements.txt
in your terminal. ngl I have not tested whether I put inall the requirements correctly - Obtain an API key from OpenAI and save it in a file named
key_openai.txt
in the root directory of the repository. - Run
cleanup.py
to get rid of all the example pdf and text files that I've left in here - If you plan to send files to an Obsidian vault (if you don't know what this means ignore this step and the file
send-to-obsidian.py
) then openconfig.py
and define directories forobsidian_vault_location
andobsidian_vault_attachments_location
- Maybe peruse
config.py
to check settings and try to gain a better understanding of this monstrocity I've created. I suggest editingprompts
to fit your use-case.
- Fill
pdfs-to-summarize/
with a bunch of PDFs you'd like to see summarized. - Run the
generate_multiple_reports.py
script to generate reports from the PDF files in thepdfs-to-summarize/
directory. The generated reports will be saved as text files in thetxt-summaries/
directory. - (Option) IF you'd like to record timestamps for a youtube video of you talking abou the PDFs like I do, then run
timestamps.py
the moment you hit record. After you finish your video introduction, hit the hotkey, and continue to hit the hotkey each time you get to a new pdf. When you stop recording clickesc
- Read through the summaries in
txt-summaries
. 4a. If there are any you'd like sent to your Obsidian vault along with their PDF version, copy & paste the txt file intotxt-summaries/send-to-obsidian/
. Then runsend-to-obsidian.py
4b. If there are any you'd like prepended to the beginning of their corresponding pdf file, copy & paste the txt file intotxt-summaries/to-be-concatenated/
. Then runconcatenate.py
and they will apprear inconcatenated-summaries/
- (Option) IF you'd like to start your own auto-generated pdf summary newsletter, then run
newsletter.py
and copy & paste the outputted contents ofnewsletter.txt
into your newsletter. Don't forget to editnewsletter_prompt
inconfig.py
to cater it to your needs. - Once you're finished and have saved all temporary files that you'd like to keep to a different directory, run
cleanup.py
to delete all files created by the previous scripts.
This codebase is an absolute shitshow and the only thing most people will find useful is generate_multiple_reports.py
which wasn't even written by me, that was 99% Dave Shapiro. Again go check out his repo https://github.com/daveshap/Quickly_Extract_Science_Papers and his youtube channel https://www.youtube.com/@4IR.David.Shapiro