daily-paper-summaries-workflow

EXECUTIVE SUMMARY

This repo is public access to what I use for my youtube video daily AI paper breakdowns. https://youtube.com/playlist?list=PLPefVKO3tDxP7iFzaSOkOZnXQ4Bkhi9YB&si=J0Rmcmy-oVyAZI7I

It's a heavily edited version of David Shapiro's original repo that's been suited to my needs. If I was smarter I would've forked off of his but too late now. https://github.com/daveshap/Quickly_Extract_Science_Papers

I don't think anybody will really find a majority of these files useful but who knows. The one I've been asked to share is generate_multiple_prompts.py which you could find a similar version of on Dave's repo. The only real difference with mine is that I adjusted it to use the longer 16k context window version of GPT-3.5 when that came out.

Repo Contents

generate_multiple_reports.py - this will consume all PDFs in the pdfs-to-summarize/ folder and use OpenAI's API to generate summaries in the txt-summaries/ folder. This is helpful for bulk processing such as for literature reviews.
concatenate.py - this will turn all txt summary files that have been copy & pasted from txt-summaries/ to txt-summaries/to-be-concatenated into pdfs that I think are super useful for sharing. Basically it prepends the summary to the beginning of the original pdf. I often want to share scientific articles with friends but they usually don't want to read the whole thing, so giving them a version with a summary in the beginning is super useful.
send-to-obsidian.py - this will take any txt files that have been copy & pasted from txt-summaries/ to txt-summaries/send-to-Obsidian and send them and their pdf versions into your obsidian vault. You need to specify the location of your obsidian vault in config.py in order for it to work.
newsletter.py - creates the actual newsletter that I publish daily to https://evintunador.substack.com?utm_source=navbar&utm_medium=web&r=1ixdk1 by concatenating all the summaries, and then having the chatGPT API do a little meta-summary intro to the newsletter. Saves to newsletter.txt which is what I copy & paste into substack
timestamps.py - a script that generates youtube chapter timestamps based on the pdfs that have been summarized. Hit a configurable hotkey to (I use `) to indicate that a new yt chapter should start, and esc to end the script. Creates timestamps.txt which is what I copy & paste into my yt description.
cleanup.py - Deletes all of the files that are generated by all the other scripts. I'd recommend running this after you download the repo since I've left it populated with a bunch of example txt and pdf files for example purposes
config.py - Where you can change a couple settings if you'd like.
I've left in a bunch of example pdf and txt files so that you can see how they work, and also as an excuse to share a few shitty half-finished papers that I've either completely abandoned or put on hold for various reasons. Feel free to either use cleanup.py to remove them or delete them manually
- everything in pdfs-to-summarize/, txt-summaries, and concatenated-summaries
- concat_summaries.txt, newsletter.txt, and timestamps.txt

SETUP

Clone the repository to your local machine.
Install the required Python packages by running pip install -r requirements.txt in your terminal. ngl I have not tested whether I put inall the requirements correctly
Obtain an API key from OpenAI and save it in a file named key_openai.txt in the root directory of the repository.
Run cleanup.py to get rid of all the example pdf and text files that I've left in here
If you plan to send files to an Obsidian vault (if you don't know what this means ignore this step and the file send-to-obsidian.py) then open config.py and define directories for obsidian_vault_location and obsidian_vault_attachments_location
Maybe peruse config.py to check settings and try to gain a better understanding of this monstrocity I've created. I suggest editing prompts to fit your use-case.

USAGE

Fill pdfs-to-summarize/ with a bunch of PDFs you'd like to see summarized.
Run the generate_multiple_reports.py script to generate reports from the PDF files in the pdfs-to-summarize/ directory. The generated reports will be saved as text files in the txt-summaries/ directory.
(Option) IF you'd like to record timestamps for a youtube video of you talking abou the PDFs like I do, then run timestamps.py the moment you hit record. After you finish your video introduction, hit the hotkey, and continue to hit the hotkey each time you get to a new pdf. When you stop recording click esc
Read through the summaries in txt-summaries. 4a. If there are any you'd like sent to your Obsidian vault along with their PDF version, copy & paste the txt file into txt-summaries/send-to-obsidian/. Then run send-to-obsidian.py 4b. If there are any you'd like prepended to the beginning of their corresponding pdf file, copy & paste the txt file into txt-summaries/to-be-concatenated/. Then run concatenate.py and they will apprear in concatenated-summaries/
(Option) IF you'd like to start your own auto-generated pdf summary newsletter, then run newsletter.py and copy & paste the outputted contents of newsletter.txt into your newsletter. Don't forget to edit newsletter_prompt in config.py to cater it to your needs.
Once you're finished and have saved all temporary files that you'd like to keep to a different directory, run cleanup.py to delete all files created by the previous scripts.

NOTE