source .venv/Scripts/activate
Scrape stuff from Gunn Confessions' Facebook Page using Python (language diversity!) because Facebook won't give access to their proper APIs
Try running
python get_all.py
-
Attempt 1: The Facebook API would be ideal; however, in order to fetch public pages from other people, you need to request permission from Facebook. To do so, you first must make a working demonstration of what you're making (they assume everyone uses the API for Facebook integration in mobile apps, for some reason), which is rather difficult given that you can't make a demo of something using something that you're asking for because you don't have it. Anyways, I ended up making my own test confessions page, but since it worked, I guess Facebook thought the special permission wasn't needed, so they rejected my request.
-
Attempt 2: Scraping the web pages could be done in any language, but I chose Python because language diversity is important, and I heard Python was good for it. This works, but searching got me rate-limited by IP. I'm not sure if this is a permanent ban.
-
Attempt 3: A userscript that automatically searches Facebook is probably the most hacky solution. For some reason, the rate-limit was only enforced when signed out, so I made a userscript to automatically search while signed in. Eventually, I got rate-limited again, but at least it gave a friendly notice that it was only temporary.
To use Python in Git Bash, you need to add
alias python='winpty python.exe'
to /etc/bash.bashrc
(src)
Apparently you need to run python -m venv .venv
to create a virtual environment so PIP can stay separate. (.venv
is just the directory to put it inside; .venv
is a common name apparently)
So then afterwards I must run (in Git Bash; cmd is probably different)
source .venv/Scripts/activate
If it works, you won't forget to do deactivate
because it'll keep printing (.venv)
after every command until you run deactivate
.
Then do whatever pip install
magic is necessary.
To save requirements.txt, do pip freeze > requirements.txt
. To load from it, do pip install -r requirements.txt
There exists a pdb
for debugging Python stuff. Do python -m pdb -c continue scraper.py
to pause on exception (src). Commands
p expression
runsexpression
I should learn about PYTHONICISM
4918 has a photo which broke things
this post is not a confession