This project is designed to automate the process of saving, summarizing, and re-uploading academic papers from the internet to a Notion database. It combines the use of Notion Scraper, AWS Lambda, and ChatGPT to achieve this automation. Below is a detailed guide on how this project works, including setup instructions and a breakdown of tasks. Additionally, the project now involves two Lambda functions, with the first function integrating with Zapier for Notion integration.
The project workflow can be summarized as follows:
-
Save using Notion Scraper: Notion Scraper is used to save a page from a URL to a Notion database, creating a new entry with information about the paper.
-
Pick up using Notifiers: Notifiers or triggers are set up to detect when a new page is saved in Notion. This triggers the first Lambda function which searches arXiv for the corresponding paper, downloads the PDF, and stores it in an S3 bucket.
-
Download PDF: The second Lambda function, triggered by the S3 bucket, downloads the latest PDF and performs the paper summarization.
-
Run GPT to Summarize: The Lambda function uses ChatGPT to generate a summary of the paper.
-
Re-upload to Notion: Finally, the Lambda function uploads the generated summary to the same Notion database but in a different column.
To get this project up and running, follow these tasks:
-
Install Notion Scraper using
pip install notion-scraper
. -
Create a new Notion database with the following columns:
Title
,URL
,PDF
,Summary
. -
Get an API token from Notion by following the instructions here.
-
Configure Notion Scraper to use your API token and save a page from a URL to the Notion database.
-
Test Notion Scraper with different URLs to ensure that pages are saved correctly.
-
Use the Zapier integration to trigger the first Lambda function when a new page is saved in Notion.
-
Configure the first Lambda function to search arXiv for the corresponding paper and download the PDF to an S3 bucket.
-
Implement the second Lambda function as shown in the provided code.
-
Modify the Lambda function code to run ChatGPT on the PDF file and generate a summary of the paper.
-
Modify the second Lambda function code to pass the summary output from ChatGPT to Notion Scraper.
-
Use the notion-client library to update a page in the Notion database with a new summary.
-
Test the entire system with different URLs and papers.
-
Check for any errors or bugs and fix them.
-
Ensure that the summaries generated by ChatGPT are accurate and concise.
Build docker image lambda function wtih API Gateway. Just define a function and build your image you want.
-
Install
awscli
: Official Document for installation# After installing awscli. aws configure
-
Install
direnv
# Install `direnv` (macOS) brew install direnv ## Append to `~/.zshrc` eval "$(direnv hook zsh)"
-
Fill
envrc
-
Relaunch your shell.
# Relaunch your shell. # Check the variables with the command as below. printenv | grep <VARIABLE-NAME>
# Login ECR.
# https://awscli.amazonaws.com/v2/documentation/api/latest/reference/ecr/get-login-password.html
./scripts/00-login-ecr.sh
# Create ECR repository.
# https://awscli.amazonaws.com/v2/documentation/api/latest/reference/ecr/create-repository.html
./scripts/01-create-ecr-repository.sh
# Build image and push to Amazon ECR.
./scripts/02-ecr-tag-and-push.sh
# Create and deploy lambda function.
# https://awscli.amazonaws.com/v2/documentation/api/latest/reference/iam/create-role.html
# https://awscli.amazonaws.com/v2/documentation/api/latest/reference/iam/put-role-policy.html
./scripts/03-create-iam-for-lambda.sh
# https://awscli.amazonaws.com/v2/documentation/api/latest/reference/lambda/create-function.html
./scripts/04-deploy-to-lambda.sh
# Provision API Gateway.
# https://awscli.amazonaws.com/v2/documentation/api/latest/reference/apigateway/create-rest-api.html
# https://awscli.amazonaws.com/v2/documentation/api/latest/reference/apigateway/get-resources.html
# https://awscli.amazonaws.com/v2/documentation/api/latest/reference/apigateway/create-resource.html
# https://awscli.amazonaws.com/v2/documentation/api/latest/reference/apigateway/put-method.html
# https://awscli.amazonaws.com/v2/documentation/api/latest/reference/apigateway/put-method-response.html
# https://awscli.amazonaws.com/v2/documentation/api/latest/reference/apigateway/put-integration.html
# https://awscli.amazonaws.com/v2/documentation/api/latest/reference/apigateway/put-integration-response.html
# https://awscli.amazonaws.com/v2/documentation/api/latest/reference/apigateway/create-deployment.html
./scripts/05-create-api-gateway.py
# after fix code at './lambda'
./scripts/10-update-function.sh
- Policy generator : https://awspolicygen.s3.amazonaws.com/policygen.html