I am sharing the code I used for summarizing SSC as explained in this post.
This was done very quickly and most of the code was written by ChatGPT and Copilot.
The sequence is roughly the following:
- Scrape the posts and put them in data/posts
- Do one ChatGPT API pass to chunk each post and generate 1 or more summaries
- Find posts which have been chunked and consolidate their summaries. I found it best to simply append the summaries together.
- In data/posts_processed, add each post as a JSON with its summary.
- Generate an epub with the post JSONs.
This could certainly be improved or productized but I will probably not invest any more time in it.