This repository is about scrapping a Facebook's group posts and sending out a email newsletter.
- you must be an admin of the target facebook group
- you must create a facebook app with a Group Post permission (app doesn't have be reviewed though)
- please follow this page for Groups API
- email receivers could be configured via
mailinglist.txt
It works with GitHub action, but you can run locally as well.
- GitHub Action dycryptes
.env.gpg
, and you will get.env
- GitHub Action triggers this application running
- exchange facebook's
access token
with a long-lived one - replace the exchanged token to the existing one in
.env
- grab group posts via Facebook's Graph API
- sort them
- inject posts's information into HTML (jinja)
- HTML/CSS will be nicely formatted for email (premailer)
- email will be sent to those listed in
mailinglist.txt
- GitHub Action deletes
.env.gpg
- GitHub Action encrypts
.env
into.env.gpg
- GitHub Action pushes
.env.gpg
to your current repository
$ python main.py --help
usage: main.py [-h] --since SINCE --until UNTIL --email-title EMAIL_TITLE [--limit LIMIT]
[--weight-reactions WEIGHT_REACTIONS] [--weight-shares WEIGHT_SHARES] [--weight-comments WEIGHT_COMMENTS]
Please specify the range of dates and the number of posts to be collected
optional arguments:
-h, --help show this help message and exit
--since SINCE dates in YYYY-MM-DD
--until UNTIL dates in YYYY-MM-DD
--email-title EMAIL_TITLE
title for the email
--limit LIMIT number of posts to scrap
--weight-reactions WEIGHT_REACTIONS
from 0 to 1
--weight-shares WEIGHT_SHARES
from 0 to 1
--weight-comments WEIGHT_COMMENTS
from 0 to 1
$ python main.py --since 2020-01-10 \
--until 2020-01-20 \
--email-title="Weekly Newsletter"
$ python main.py --since 2020-01-10 \
--until 2020-01-20 \
--email-title="Weekly Newsletter" \
--limit 50
$ python main.py --since 2020-01-10 \
--until 2020-01-20 \
--email-title="Weekly Newsletter" \
--limit weights-reactions=0.5 \
--weights-shares=0.8 \
--weights-comments=1
There are two config files.
- .env
- this dotenv file contains sensitive information
- config.cfg
- this contains information that you can customize yourself
You should set values described below on your own.
- APP_ID: your facebook app id
- APP_SECRET: your facebook secret
- FB_ACCESS_TOKEN: access token issued via Graph API Explorer. make sure your access token belongs to your app.
APP_ID=XXX
APP_SECRET=XXX
FB_ACCESS_TOKEN=XXX
SMTP_USER=XXX
SMTP_PASS=XXX
.env
only needs to be set initially. After you set .env
, you have to encrypt it into .env.gpg
using the command below. You must set your own password in --passphrase
option.
gpg2 --quiet --batch --yes --decrypt --passphrase="SYMMETRIC_KEY" --output=.env .env.gpg
Also you must set the same passphrase
in the SECRETS for your GitHub Repo. Here is the steps for doing so.
- Go to
Settings
tab. - Go to
SECRETS
menu on the left. - Click
New secret
button on the top left. - Set
Name
asGPG_KEY
. - Set the value of the
GPG_KEY
to your choice as in--passphrase
- Click
Add secret
button.
The name of each key explains what they are. TOP_K
means how many posts you want to grap. Only the TOP 10 posts will be included in the email with image. The rest will be included with only text. FIRST_WORDS
means how many words you want to keep for the TOP 10 posts. SUB_FIRST_WORDS
means how many words you want to keep for the other posts than TOP 10.
# config.cfg
[config]
TOP_K = XXX
FIRST_WORDS = XXX
SUB_FIRST_WORDS = XXX
[fb]
FB_GROUP_ID = XXX
[web]
HEAD_IMAGE = image_url_to_appear_at_the_top_of_email
HEAD_ARTICLE = a_sentence_to_appear_at_the_top_of_email
Also the scrapped posts are sorted via the following formula. You can change the weights from the CLI. Please look at the Usage section.
(# of reactions * WEIGHT_REACTIONS) + (# of shares * WEIGHT_SHARES) + (# of comments * WEIGHT_COMMENTS)