Notion Scraper and ChatGPT PDF Summarizer

This project is designed to automate the process of saving, summarizing, and re-uploading academic papers from the internet to a Notion database. It combines the use of Notion Scraper, AWS Lambda, and ChatGPT to achieve this automation. Below is a detailed guide on how this project works, including setup instructions and a breakdown of tasks. Additionally, the project now involves two Lambda functions, with the first function integrating with Zapier for Notion integration.

Project Overview

The project workflow can be summarized as follows:

Save using Notion Scraper: Notion Scraper is used to save a page from a URL to a Notion database, creating a new entry with information about the paper.
Pick up using Notifiers: Notifiers or triggers are set up to detect when a new page is saved in Notion. This triggers the first Lambda function which searches arXiv for the corresponding paper, downloads the PDF, and stores it in an S3 bucket.
Download PDF: The second Lambda function, triggered by the S3 bucket, downloads the latest PDF and performs the paper summarization.
Run GPT to Summarize: The Lambda function uses ChatGPT to generate a summary of the paper.
Re-upload to Notion: Finally, the Lambda function uploads the generated summary to the same Notion database but in a different column.

Project Setup

To get this project up and running, follow these tasks:

Task 1: Set up Notion Scraper

Install Notion Scraper using pip install notion-scraper.
Create a new Notion database with the following columns: Title, URL, PDF, Summary.
Get an API token from Notion by following the instructions here.
Configure Notion Scraper to use your API token and save a page from a URL to the Notion database.
Test Notion Scraper with different URLs to ensure that pages are saved correctly.

Task 2: Set up AWS Lambda (First Function - Zapier Integrated)

Use the Zapier integration to trigger the first Lambda function when a new page is saved in Notion.
Configure the first Lambda function to search arXiv for the corresponding paper and download the PDF to an S3 bucket.

Task 3: Set up AWS Lambda (Second Function - PDF Summarization)

Implement the second Lambda function as shown in the provided code.
Modify the Lambda function code to run ChatGPT on the PDF file and generate a summary of the paper.

Task 4: Integrate ChatGPT with Notion Scraper

Modify the second Lambda function code to pass the summary output from ChatGPT to Notion Scraper.
Use the notion-client library to update a page in the Notion database with a new summary.

Task 5: Test and Debug

Test the entire system with different URLs and papers.
Check for any errors or bugs and fix them.
Ensure that the summaries generated by ChatGPT are accurate and concise.

Deploy Python Lambda functions with container images.

Build docker image lambda function wtih API Gateway. Just define a function and build your image you want.

👋 Prerequisite

Install awscli: Official Document for installation
```
# After installing awscli.
aws configure
```

Install direnv

# Install `direnv` (macOS)
brew install direnv

## Append to `~/.zshrc`
eval "$(direnv hook zsh)"

Fill envrc

Relaunch your shell.

# Relaunch your shell.
# Check the variables with the command as below.
printenv | grep <VARIABLE-NAME>

🎮 How to deploy

# Login ECR.
# https://awscli.amazonaws.com/v2/documentation/api/latest/reference/ecr/get-login-password.html
./scripts/00-login-ecr.sh

# Create ECR repository.
# https://awscli.amazonaws.com/v2/documentation/api/latest/reference/ecr/create-repository.html
./scripts/01-create-ecr-repository.sh

# Build image and push to Amazon ECR.
./scripts/02-ecr-tag-and-push.sh

# Create and deploy lambda function.
# https://awscli.amazonaws.com/v2/documentation/api/latest/reference/iam/create-role.html
# https://awscli.amazonaws.com/v2/documentation/api/latest/reference/iam/put-role-policy.html
./scripts/03-create-iam-for-lambda.sh

# https://awscli.amazonaws.com/v2/documentation/api/latest/reference/lambda/create-function.html
./scripts/04-deploy-to-lambda.sh

# Provision API Gateway.
# https://awscli.amazonaws.com/v2/documentation/api/latest/reference/apigateway/create-rest-api.html
# https://awscli.amazonaws.com/v2/documentation/api/latest/reference/apigateway/get-resources.html
# https://awscli.amazonaws.com/v2/documentation/api/latest/reference/apigateway/create-resource.html
# https://awscli.amazonaws.com/v2/documentation/api/latest/reference/apigateway/put-method.html
# https://awscli.amazonaws.com/v2/documentation/api/latest/reference/apigateway/put-method-response.html
# https://awscli.amazonaws.com/v2/documentation/api/latest/reference/apigateway/put-integration.html
# https://awscli.amazonaws.com/v2/documentation/api/latest/reference/apigateway/put-integration-response.html
# https://awscli.amazonaws.com/v2/documentation/api/latest/reference/apigateway/create-deployment.html
./scripts/05-create-api-gateway.py

🦿 Update your function.

# after fix code at './lambda'
./scripts/10-update-function.sh

Customize boilerplate

Policy generator : https://awspolicygen.s3.amazonaws.com/policygen.html

usama13o/LambdaGPTPaperSummeriser